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Abstract 


Barrier  Methods  for  Large-Scale  Quadratic  Programming 

by 

Dulce  B.  Ponceleon 
October  1990 

VVe  present  several  new  algorithms  for  solving  the  general  large-scale  quadratic 
programming  (QP)  problem. 

A  feature  of  QP  problems  is  the  presence  of  linear  inequality  constraints,  which 
introduce  a  combinatorial  aspect  to  the  problem.  Currently  the  most  common  ap¬ 
proach  to  solving  QP  problems  is  to  apply  active-set  methods,  in  which  only  some  of 
the  inequalities  are  used  to  produce  a  search  direction  at  each  stage.  The  combina¬ 
torial  element  is  therefore  inherent.  As  problems  become  larger,  there  is  a  potential 
for  an  excessive  number  of  iterations  and  consequent  inefficiency. 

In  contrast,  we  use  the  now  familiar  barrier-function  approach,  which  circumvents 
the  combinatorial  aspect  by  introducing  a  barrier  transformation  involving  all  of  the 
inequalities.  The  barrier  term  enforces  satisfaction  of  the  inequalities  by  modifying 
the  objective  function.  The  transformed  problem  is  solved  by  a  modified  Newton 
method  applied  to  the  nonlinear  equations  defining  feasibility  and  optimality. 

The  main  computation  at  each  iteration  of  the  new  algorithms  is  the  solution 
of  an  indefinite  system  of  linear  equations.  Barrier  methods  are  known  to  lead  to 
ill-conditioned  systems.  However,  we  show  by  a  special  sensitivity  analysis  that  the 
particular  manner  in  which  we  have  formulated  the  barrier  transformation  leads  to 
ill-conditioning  that  is  benign. 

We  address  the  many  details  that  need  to  be  resolved  in  order  to  define  an  efficient 
algorithm  for  solving  large-scale  QP  problems.  A  specific  barrier  algorithm  has  been 
implemented,  with  linear  programming  (I  P)  included  as  a  special  case.  Numerical 
results  are  presented  for  a  set  of  sparse  QP  test  problems.  A  feature  of  the  imple¬ 
mentation  is  that  its  efficiency  does  not  depend  on  whether  the  solution  is  near  or  far 
from  a  vertex  of  the  feasible  region. 
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Chapter  1 
Introduction 


1.1  Statement  of  the  Problem 

In  this  thesis  we  present  several  new  algorithms  to  solve  the  general  quadratic  program 
(QP)  of  minin'i  ing  a  quadratic  objective  function  subject  to  linear  constraints  on 
the  variables.  Quadratic  programming  problems  are  among  the  best  understood 
areas  of  optimization,  with  algorithms  for  them  dating  back  to  the  1950’s.  Many 
mathematically  equivalent  formulations  of  the  problem  are  possible,  and  the  choice 
depends  mainly  on  the  context.  It  will  be  seen  later  that  a  very  specific  form  of  the 
problem  is  crucial  to  the  barrier  algorithms  presented  in  this  dissertation,  but  initially 
we  shall  consider  quadratic  programs  in  the  following  form: 


minimize 
i  esn 

Q(x )  =  cTx  4-  \xtHx 

subject  to 

Ax  > 

(1.1.1) 


where  the  Hessian  matrix  H  of  the  quadratic  function  is  an  n  x  n  symmetric  matrix, 
c  is  an  n-vector,  A  is  an  m  x  n  matrix,  and  /?  is  an  m-vector.  We  are  particularly 
interested  in  the  case  where  the  matrices  A  and  H  are  large  and  sparse,  and  we 
include  the  general  linear  programming  (LP)  problem  arising  when  H  =  0. 

We  shall  assume  at  least  one  bounded  solution  to  (1.1.1)  exists,  say  r*.  When 
H  is  positive  semidefinite,  the  problem  is  termed  a  convex  QP.  A  useful  property  of 
a  convex  QP  is  that  a  local  minimizer  x*  of  a  convex  QP  is  also  a  global  minimizer. 
When  H  is  positive  definite,  x*  is  unique.  When  H  is  indefinite,  more  than  one 
local  solution  may  exist.  Moreover,  some  of  the  local  solutions  may  not  be  global. 
Naturally  these  characteristics  of  the  solution  have  an  important  impact  on  the  design 
of  an  algorithm. 

The  problem  of  finding  a  global  minimizer  of  a  general  optimization  problem  is 
very  difficult  and  the  indefinite  quadratic  case  is  no  exception.  It  can  be  shown  that 
finding  a  global  minimizer  of  an  indefinite  QP  is  equivalent  to  finding  an  integer  so¬ 
lution  to  a  linear  program,  a  known  difficult  problem.  Hence,  most  algorithms  do  not 
attempt  to  find  more  than  a  local  minimizer.  Indeed,  even  the  verification  that  some 
point  is  a  local  minimizer  of  a  general  quadratic  program  is  under  sone  circumstances 
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an  NP-hard  problem — see  Contesse  [Con80],  Murty  and  Kabadi  [MK87]  and  Pardalos 
and  Schnitger  [PS88]. 

Figure  1.1  shows  the  contours  of  a  strictly  convex  quadratic  function  and  the 
feasible  region  defined  by  the  linear  constraints.  The  shaded  area  corresponds  to  the 
infeasible  region.  In  this  example  the  solution  x*  is  unique,  i.e.  it  is  a  strong  global 
minimizer.  Figure  1.2  shows  the  contours  of  a  nonconvex  quadratic  function.  This 
function  has  two  isolated  local  minimizers,  only  one  of  which  is  a  global  minimizer. 

In  contrast  with  a  typical  linear  programming  problem,  the  solution  to  either 
problem  does  not  lie  at  a  vertex  of  the  feasible  region.  Notice  that  if  the  minimizer  x* 
was  in  the  interior,  it  would  imply  that  x*  was  also  a  minimizer  of  the  unconstrained 
problem,  or  equivalently,  the  linear  constraints  would  not  play  any  role.  In  general, 
the  minimizer  of  a  constrained  QP  is  expected  to  lie  on  a  boundary  of  the  feasible 
region,  although  not  necessarily  at  a  vertex. 

For  large-scale  problems,  very  few  practical,  general-purpose  algorithms  are  cur¬ 
rently  available.  Almost  all  current  methods  for  QP  whether  for  large  or  small  prob¬ 
lems  are  of  a  type  known  as  active-set  methods.  In  this  thesis  we  describe  algorithms 
for  QP  that  are  not  active-set  methods.  In  developing  a  large-scale  QP  algorithm  of 
a  radically  different  nature,  we  avoid  some  of  the  fundamental  drawbacks  present  in 
active-set  methods.  Certain  different  difficulties  exist,  but  we  anticipate  that  at  least 
for  some  problems  the  balance  will  favor  the  new  approach. 
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1.2  Applications 

Quadratic  programs  arise  from  many  diverse  applications.  For  instance,  the  well 
known  PILOT  problem  [SMD86]  is  a  large-scale  dynamic  model  of  the  U.S.  economy 
that  synthesizes  representations  of  all  sectors  of  the  economy  in  a  general  equilib¬ 
rium  context.  PILOT  simulates  economic  interactions  among  sectors  by  determining 
market-clearing  prices  and  quantities  for  all  commodities  over  time.  The  Hessian 
matrix  is  diagonal  and  semidefinite.  Approximately  one  percent  of  the  variables  con¬ 
tribute  to  the  quadratic  term. 

Portfolio  Optimization  in  the  business  world  constitutes  another  source  of  quadratic 
programs.  Specific  applications  include  asset  allocation  (such  as  stocks  and  bonds), 
and  optimizing  risk-return  tradeoffs  assuming  superior  investment  judgement.  Many 
investment  advisory  firms  and  pension-plan  sponsors  today  routinely  compute  mean- 
variance  efficient  portfolios  as  part  of  the  portfolio  allocation  process.  Given  that 
investment  companies  use  large-scale  portfolio  models  to  trade  billions  of  dollars ,  the 
development  of  a  robust  large-scale  QP  algorithm  is  of  paramount  importance. 

As  well  as  being  important  in  their  own  right,  quadratic  programs  often  appear 
as  subproblems  within  algorithms  for  nonlinear  programming.  Our  interest  in  sparse 
quadratic  programs  arises  in  part  from  the  desire  to  apply  sequential  quadratic  pro¬ 
gramming  (SQP)  methods  to  large  nonlinearly  constrained  problems.  As  the  name 
suggests,  each  iteration  of  an  SQP  method  involves  solving  a  quadratic  programming 
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subproblem.  These  methods  are  regarded  as  state-of-the-art  methods  for  the  solution 
of  nonlinear  problems.  The  overall  performance  depends  critically  on  the  efficient 
solution  of  the  QP  subproblems. 

Naively,  one  might  think  that  to  develop  an  SQP  method  for  solving  nonlinear 
programs,  it  would  suffice  to  have  a  black  box  for  solving  the  quadratic  subproblems. 
Unfortunately,  this  is  not  true  even  in  the  dense  case;  see  Gill  et  al.  [GMSW85]  for 
a  detailed  discussion  of  the  difficulties  involved  in  adapting  active-set  QP  algorithms 
to  the  SQP  environment. 

1.3  Historical  Background 

Given  the  importance  of  quadratic  programming,  it  is  not  surprising  that  many  algo¬ 
rithms  have  been  proposed.  Some  of  these  were  derived  as  extensions  of  the  simplex 
method  for  linear  programming,  in  the  sense  that  they  involved  pivot  operations  on 
a  simplex-type  tableau.  These  include  Wolfe’s  simplex  method  for  QP  [Wol59]  and 
some  of  the  algorithms  for  linear  complementarity  problems— notably  the  principal 
pivoting  method  of  Cottle  and  Dantzig  [CD68,Cot6S,CotS9]  and  Lemke’s  method 
[Lem62]. 

Beale’s  method  [Bea55,Bea59]  was  one  of  the  more  successful  early  approaches.  It 
has  the  feature  of  tending  to  be  more  efficient  if  Q(x)  is  linear  in  most  of  the  variables 
(i.e.  if  II  has  low  rank).  In  many  early  QP  algorithms  it  was  necessary  for  H  to  be 
positive  definite  and  such  algorithms  were  therefore  restricted  in  their  application. 
In  particular,  they  were  unsuited  to  problems  derived  as  extensions  to  LP  models — a 
common  source  of  QP’s. 

The  equality-constrained  quadratic  program  (EQP)  constitutes  the  simplest  QP. 
In  this  case,  the  constraints  are  of  the  form  Ax  =  b.  A  distinctive  feature  of  an  EQP 
:s  that  its  solution,  if  it  exists,  can  be  obtained  by  c  jiving  a  single  system  of  linear 
equations  (known  as  the  Iiarush-Kuhn ■  Tucker  or  KKT  system)  involving  the  matrix 


No  iterative  proceduie  is  needed.  A  variety  of  numerical  methods  have  been  developed 
to  compute  the  solution  of  EQP.  The  major  differences  among  them  arise  from  the 
numerical  techniques  used  to  solve  the  system  of  linear  equations.  The  method  of 
choice  depends  on  the  size  and  representation  of  the  problem. 

One  approach  is  to  use  a  factorization  of  K  itself — sometimes  referred  to  in  the 
literature  as  Lagrangian  or  Karush-Kvhn-Tucker  methods.  Other  approaches,  the 
so-called  projection  methods,  involve  breaking  down  the  system  into  smaller  and 
simpler  systems.  The  factorizations  needed  arc  then  smaller  than  in  Lagrangian-type 
methods,  but  less  likely  to  be  sparse. 
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The  constraint  matrix  A  can  be  viewed  as  defining  two  complementary  subspaces: 
the  null  space  of  vectors  orthogonal  to  the  rows  of  A,  and  the  range  space  of  vectors 
spanned  by  the  rows  of  A.  From  this  observation,  two  categories  of  projection  methods 
have  been  developed,  namely  null-space  methods  and  range-space  methods.  In  many 
cases,  the  work  required  to  solve  EQP  is  directly  proportional  to  the  dimension  of  the 
corresponding  subspace. 

EQP  methods  are  fundamental  for  solving  subproblems  that  arise  in  active-set 
methods  for  the  inequality-constrained  quadratic  program  (IQP).  Unlike  an  EQP,  an 
IQP  can  be  solved  only  by  iteration.  In  the  IQP  case,  a  key  piece  of  information  is 
unknown — namely,  the  set  of  constraints  that  hold  with  equality  at  the  solution.  Such 
constraints  are  termed  the  active  set  of  constraints  at  the  solution  or  simply  the  active 
set.  If  the  active  set  were  known  a  priori ,  the  solution  of  an  IQP  could  be  computed 
directly  by  solving  a  single  EQP  problem  (ignoring  the  inactive  constraints). 

In  the  1970’s  and  80’s  a  number  of  sophisticated  active-set  methods  were  devel¬ 
oped.  Such  methods  are  so-called  because  they  maintain  a  prediction  of  the  active 
set,  called  the  working  set.  This  is  a  linearly  independent  set  of  constraints  that  are 
satisfied  exactly  at  the  beginning  of  each  iteration.  Information  is  gathered  at  the 
current  iterate  to  allow  changes  in  the  working  set  to  improve  the  prediction.  Itera¬ 
tions  proceed  until  the  active  set  is  identified.  It  is  common  for  any  given  constraint 
to  enter  and  leave  the  working  set  many  times.  Although  the  working  sets  may  not 
repeat,  the  number  of  possibilities  is  a  combinatorial  function  of  the  number  of  con¬ 
straints  and  variables.  Early  active-set  methods  are  those  due  to  Dantzig  and  Wolfe 
[DanGlj,  Fletcher  [Fle71],  and  Murray  [Mur71a].  More  recent  methods  are  due  to 
Gill  and  Murray  [GM78],  Powell  [PowSl],  Goldfarb  and  Idnani  [GI83],  and  Gill  et  al. 
[GMSW84c]. 

Naturally,  some  of  the  IQP  methods  are  classified  according  to  the  type  of  method 
they  use  to  solve  the  EQP  subproblems.  For  example,  the  methods  of  Murray 
[Mur71a],  Gill  and  Murray  (GM78],  and  Bunch  and  Kaufman  [BK80]  are  null-space 
methods,  and  are  more  efficient  when  the  number  of  constraints  in  the  working  set 
is  close  to  n,  since  the  dimension  of  the  null  space  is  then  relatively  small.  Range- 
space  methods  increase  in  efficiency  as  the  number  of  constraints  in  the  working  set 
decreases.  For  an  example  of  a  range-space  method,  see  Gill  et  d.  [GGM*S4]. 

We  emphasize  that  under  certain  conditions  (for  example,  when  H  is  positive 
definite),  many  active-set  methods  for  solving  IQP  are  mathematically  equivalent ,  in 
the  sense  that  they  calculate  identical  sequences  of  iterates  when  applied  to  the  same 
problem  with  the  same  initial  working  set.  (See  Djang  [Dja79],  Gottle  and  Djang 
1CD79J  and  Best  (BesSC) 

Approaches  that  treat  the  indefinite  system  (1.3.1)  directly,  may  be  especially 
appropriate  for  sparse  problems;  see  Duff  and  Reid  [DRS2],  Forsgicn  and  Murray 
[FM90]  and  Gill  et  al.  [GMSW84b,GMSW87]. 
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An  important  class  of  active-set  methods  are  the  so-called  inertia-controlling  algo¬ 
rithms  for  QP  (ICQP  methods).  Such  methods  restrict  the  change  that  can  take  place 
between  successive  working  sets,  placing  a  strict  control  on  the  degree  of  indefiniteness 
of  the  reduced  Hessian  matrix  (a  measure  of  nonconvexity).  Lagrangian,  null-space 
and  range-space  methods  may  all  be  ICQP  methods.  A  complete  description  of  ICQP 
methods  is  given  by  Gill  et  al.  [GMSW88]. 

Let  Ak  denote  the  matrix  of  constraints  defining  the  working  set  at  iteration  k. 
All  active-set  method®  can  be  viewed  as  solving  a  sequence  of  linear  systems  involving 
matrices  of  the  form 


i.e.,  a  sequence  of  EQF  problems.  A  feature  of  most  active-set  methods  is  that  Ak 
differs  little  from  Ak+i.  Advantage  may  be  taken  of  this  feature  by  utilizing  matrix 
factorization  updating  techniques  [GGMS74]. 

While  active-set  approaches  are  well  suited  to  many  problems,  the  potential  ex¬ 
ists  for  the  number  of  iterations  tc  oe  extremely  large  even  for  problems  of  moderate 
size.  In  fact,  in  the  simplex  method  for  linear  programming  [Dan63] — an  active-set 
method — the  number  of  iterations  in  the  worst  case  can  rise  exponentially  with  the 
number  of  constraints  [KM72].  While  it  is  well  known  for  the  simplex  method  that 
this  worst  case  is  rarely  attained,  the  combinatorial  aspect  of  active-set  methods  im¬ 
plies  that  the  number  of  iterations  often  does  grow  significantly  with  the  number  of 
constraints.  The  wish  to  circumvent  this  feature  of  active-set  methods  is  what  moti¬ 
vates  our  interest  in  barrier  methods,  just  as  it  motivated  Karmarkar’s  approach  to 
large-scale  linear  programming  [Kar34].  Another  reason  is  the  difficulty  in  extending 
matrix  updating  procedures  to  the  large-scale  case. 


1.4  Barrier-function  Methods 

The  use  of  barrier  methods  for  solving  nonlinear  optimization  oiems  dates  back 
to  the  !?.ic  lS50:s,  and  was  subsequently  well  established  in  the  1960’s  by  Fiacco  and 
McCormick  [FM68j.  It  can  be  appreciated  from  the  preced.^g  discussion  that  the 
-ombinatoria!  element  in  active-set  method®  arises  because  of  the  presence  of  incrpiol- 
ity  constraints.  A  barrier  method  eliminates  inequality  constraints  by  transforming 
the  objective  function.  For  example,  the  problem 

minimize  Fix'* 

'  (1.4.1) 

subject  to  c(i.)  >0 
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(with  c  6  3£m)  might  be  transformed  into  the  problem 

•*_  1 

minirnize  B(x,fi)  =  F(x )  .  — — (1.4.2) 

where  /r  >  0  is  a  hairier  parameter  that  weigi.  h-  :  -•  jf  the  perturbation  to  the 
original  function.  The  general  idea  of  the  classical  j  --  ier  approach  was  to  transform 
the  original  constrained  problem  into  a  sequence  o  unconstrained  problems  whose 
successive  minimizers  converge  to  the  desired  sJ.u  .  ISote  that  barrier  methods 
are  applicable  only  to  problems  for  which  a  slricth  "  'e  point  exists,  i.e.,  c(x)  >  0. 

In  general,  minimizers  of  F(x )  will  be  infeasib.  ,  or  F[x)  might  be  unbounded 
below.  In  order  to  enforce  feasibility  of  successive  iterates,  a  modified  function  is 
introduced  that  resembles  the  original  objective  function  but  has  an  additional  term 
that  sharply  approaches  infinity  at  each  of  the  constraints.  1  hat  is,  a  barrier  is  created 
at  the  boundary  of  the  feasible  region.  If  the  feasible  region  is  compact  there  must 
exist  a  feasible  (with  respect  to  the  constraints  of  the  original  problem)  minimizer 
of  the  modified  function.  If  the  weight  assigned  to  the  barrier  term  is  decreased 
toward  zero,  there  exists  a  sequence  of  minimizers  that  constitutes  a  strictly  feasible 
sequence  of  approximations  to  a  constrained  minimizer  of  the  original  problem.  It 
can  be  appreciated  from  the  example  (1.4.2)  that  the  smaller  the  value  of  fi  the 
closer  B(x,p)  approximates  F(x).  However,  no  matter  how  small  /*,  very  close  >.o  the 
boundary  of  the  feasible  region  the  two  functions  will  difTer. 

If  the  initial  estimate  of  the  minimizer  of  (1.4.2)  is  feasible  with  respect  to  the 
constraints  c(x)  >  0,  then  an  algorithm  to  solve  (1-4.2)  can  usually  be  adjusted  to 
generate  a  strictly  feasible  sequence  of  estimates. 

Figures  1.3  and  1.4  illustrate  the  effect  of  applying  a  barrier  transformation  to  a 
problem  with  one  nonlinear  constraint.  Figure  1.3  shows  the  contours  of  the  function 
F(x)  =  xxx\  and  the  contour  line  corresponding  to  a  zero  value  of  the  constraint 
c(,t)  =  2  —  xf  —  x\  >  0.  In  Figure  1.4  no  contours  of  the  ba'iicr  function  have  been 
d.rawn  in  the  i.ifeabilc  region,  since  the  function  is  not  meant  to  be  evaluated  in  that 
region. 

The  two  most  popular  barrier  functions  are  the  logarithmic  barrier  function  at¬ 
tributed  to  Frisch  [Frioo],  and  the  inverse  barrier  function  (used  in  (1.4.2))  introduced 
by  Carroll  (Car59,CarClj.  Here  •-«:&  consider  only  the  logarithmic  barrier  function.  Let 
x*  be  a  minimizer  of  (1  and  let  x*(fi)  be  the  minimizer  of  B(x,/i)  in  (1.4.2)  :br 
a  given  v-due  of  ,i.  Fiacco  and  McCormick  [FM6S]  have  shown,  for  a  wide  class  of 
bav.-tcr  functions  and  under  quite  general  conditions  on  F[x)  and  c(.r).  that  there 
exists  compact  set  containing  x*  within  which  the  sequence  {x* (/t)}  converges  to 

■V*  ?  .  ;>  -»  0. 

In  the  IpCu's,  the  popularity  of  these  methods  grew.  The  attractiveness  was  due 
in  carl  to  the  sophistication  of  algorithms  then  known  for  unconstrained  problems 
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Figure  1.4:  Barrier  Transformation  Applied  to  a  Nonlinear  Problem 
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compared  to  those  known  for  constrained  problems.  Once  the  problems  were  trans¬ 
formed  to  become  unconstrained,  “off- the- shelf’  software  could  (in  taeory)  be  used 
for  their  solution. 

Since  techniques  for  nonlinear  constraints  were  not  well  understood  at  the  time, 
their  removal  was  specially  advantageous.  Another  desinble  feature  of  the  barrier 
transformation  is  that  it  couples  the  original  objective  function  and  the  inequality 
constraints  in  a  manner  that  eliminates  motion  along  the  boundary.  Hence,  no  special 
techniques  are  needed  to  move  along  the  boundary.1  However,  solution  of  the  barrier 
subproblems  proved  difficult  and  in  the  1970’s  interest  in  barrier  methods  declined. 
One  reason  for  the  difficulty  is  that  the  Hessian  at  the  solution  becomes  progressively 
more  ill-conditioned  as  n  — *  0  [Mur71b].  Most  methods  for  unconstrained  problems 
perform  poorly  on  ill-conditioned  problems.  For  example,  Newton’s  method  has  a 
quadratic  rate  of  convergence  only  if  the  Hessian  is  nonsingular  (otneiwise  the  rate  is 
just  linear).  Thus,  although  a  constrained  problem  has  been  reduced  to  a  sequence 
of  unconstrained  problems,  the  latter  are  of  the  type  that  are  difficult  to  solve.  We 
should  emphasize  that  the  ill-conditioning  is  a  feature  of  the  barrier  transformation 
and  is  independent  of  the  condition  of  the  original  optimization  problem.  For  an 
approach  that  circumvents  this  ill-conditioning,  see  [Wri76]. 

Interest  in  barrier  methods  has  been  revived  recently,  following  the  discovery  that 
the  much-publicized  Karmarkar  algorithm  for  linear  programming  is  in  fact  equivalent 
to  a  particular  type  of  barrier  method  [Kar84,GMS*86].  It  is  now  clear  that  the 
removal  of  inequality  constraints  can  be  advantageous  even  for  linear  programming, 
since  it  avoids  the  combinatorial  element  of  finding  the  active  set. 

It  turns  out  that  linear  programming  is  a  special  case  and  that  for  such  problems 
the  usual  ill-conditioning  associated  with  barrier  functions  is  not  present  (assuming 
the  problem  is  not  dual  degenerate).  The  reason  is  that  in  LP’s  the  number  of 
constraints  active  at  the  solution  is  usually  at  least  equal  to  the  number  of  variables. 
In  general,  this  property  does  not  hold  for  QP  and  other  nonlinear  problems.  At  first 
sight,  therefore,  the  extension  of  the  new  approaches  to  LP  to  nonlinear  problems 
does  not  look  promising.  However,  we  have  been  able  to  show  that  provided  the 
barrier  transformation  is  applied  in  a  certain  manner,  the  inevitable  ill-conditioning 
is  benign. 

The  crucial  element  is  to  apply  the  barrier  transformation  to  simple  bounds  only, 
instead  of  to  general  inequalities.  For  example,  constraints  of  the  form  Ax  >  b  are 
replaced  by  the  equivalent  constraints  Ax  —  s  =  b  and  $  >  0.  The  bounds  s  >  0  are 
then  eliminated  by  the  barrier  transformation.  The  barrier  subproblems  now  have 
linear  equality  constraints.  In  general  this  is  not  a  drawback  since  many  problems 
will  already  have  some  naturally  occurring  equality  constraints. 


‘Motion  along  the  boundary  is  undesirably  complex  when  the  constraint  surface  is  nonlinear. 
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1.5  Large-scale  Quadratic  Programming 

While  general  algorithms  for  quadratic  programming  are  well  understood,  computa¬ 
tionally  attractive  algorithms  for  large-scale  quadratic  programming  are  scarce.  The 
definition  of  a  large  problem  naturally  depends  on  the  size  of  the  machine  being  used. 
It  is  generally  agreed  that  large  problems  involve  a  few  thousand  variables  and  con¬ 
straints.  When  problems  of  this  type  arise  there  is  often  considerable  structure  to 
both  the  Hessian  and  constraint  matrices.  In  particular,  the  matrices  typically  have 
an  average  of  only  5  to  10  nonzero  elements  per  column  [LusS9].  Enormous  savings 
in  computational  effort  as  well  as  storage  can  be  achieved  if  the  QP  algorithm  takes 
advantage  of  such  structure. 

Within  the  last  15  years  a  few  algorithms  have  appeared  for  large-scale  quadratic 
programming.  These  include  Tomlin’s  sparse  implementation  of  Lemke’s  method 
[Tom76b,Tom76c,Tom76a],  and  Gould’s  method  [Gou86a].  The  latter  is  an  active-set 
method  that  uses  a  sparse  solver  for  matrices  derived  from  KKT  systems. 

Among  the  best  known  algorithms  for  large-scale  optimization  is  the  MINOS 
system  due  to  Murtagh  and  Saunders  (MSS3].  While  designed  for  general  nonlinear 
programming,  MINOS  can  be  applied  to  quadratic  programs.  For  linearly  constrained 
problems,  an  active-set  method  is  used.  A  characteristic  is  that  advantage  is  taken 
of  sparsity  in  the  constraint  matrix  but  not  in  the  Hessian.  Instead,  MINOS  works 
with  a  dense  approximation  to  the  so-called  reduced  Hessian,  and  relies  on  the  fact 
that  for  many  problems  this  matrix  is  relatively  small.  If  the  reduced  Hessian  is  of 
low  dimension  (say  less  than  200),  the  algorithm  is  quite  efficient.  The  effectiveness 
of  the  method  decreases  as  the  size  of  the  reduced  Hessian  increases.  We  defer  a  more 
complete  description  to  Chapter  8,  where  the  performance  of  the  barrier  method  is 
compared  with  MINOS. 


1.6  Complexity  Results 

For  an  introduction  to  computational  complexity,  the  theory  of  NP-completencss 
and  its  significance  within  this  context,  techniques  for  analyzing  the  complexity  of 
algorithms,  and  practical  approaches  to  some  intractable  problems,  see  [GJ79,PS82], 
For  a  study  of  the  computational  complexity  of  fundamental  processes  in  numerical 
computation,  see  [Kar74]. 

Barrier  methods  belong  to  the  class  of  interior-point  methods,  in  which  inequality 
constraints  are  strictly  satisfied  throughout. 

A  key  theoretical  property  of  certain  interior-point  methods  for  LP  and  convex  QP 
is  that  they  are  polynomial-time  algorithms',  i.e.,  the  number  of  iterations  required  to 
reach  a  specified  accuracy  in  the  solution  is  a  polynomial  function  of  n,  the  number 
of  variables. 
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For  example,  for  linear  programs  and  strictly  convex  QP’s,  the  duality  gap  7  =  xTz 
is  one  measure  of  distance  from  a  solution,  assuming  x  and  z  satisfy  the  primal  and 
dual  constraints  respectively.  For  a  primal-dual  algorithm  that  maintains  such  points 
(x.z),  a  typical  complexity  result  might  state  the  following.  For  a  given  accuracy 
parameter  6 ,  if  70  =  XqZ0  is  the  duality  gap  for  some  initial  primal  and  dual  feasible 
point  (ar0)  20),  the  number  of  iterations  required  to  reduce  the  duality  gap  to  a  specified 
value  7  >  0  is  at  most  N.  where  jV  satisfies  a  bound  of  the  form 

N  <  C\fn  log(7o/^), 


for  a  moderate  constant  C. 

Further  complexity  results  are  often  given  in  terms  of  L,  the  “length”  or  “size”  of 
the  data  specifying  the  problem.  Typically, 

£  =  EP°&(I*I  + 1)1. 

where  d,  ranges  over  all  nonzero  elements  of  A,  6,  c  and  //.  In  other  words,  L  is  the 
number  of  bits  of  data  needed  to  specify  the  problem.  In  many  cases  it  is  assumed 
that  the  data  elements  d,  are  rational. 

If  the  accuracy  is  required  to  be  7  <  2~2L,  it  may  be  possible  to  state  for  some 
algorithm  that  in  the  optimal  solution  ( x*,z *), 

x*  =  0  if  Xj  <  2~l  and  z*  =  0  if  Zj  <  2~l . 

The  vemair.i.ig  components  of  the  solution  can  be  computed  in  0(n3)  operations. 
Thus,  the  co  lion 

xTz  <  2~2L 

could  theoretic  >y  be  used  as  a  stopping  criterion  for  the  algorithm. 

Karma rlca:  [  Car84]  was  the  first  to  prove  polynomiality  for  an  interior-point 
method  for  LI\  Many  such  algorithms  have  since  been  developed  for  LP,  convex 
QP,  and  linear  complementarity  problems.  They  are  variously  known  as 

•  potential-reduction  algorithms, 

•  path-following  algorithms, 

•  barrier  algorithms, 

•  affine-scaling  algorithms, 

•  projective-scaling  algorithms. 
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It  should  be  emphasized  that  all  these  algorithms  are  closely  related  despite  the  many 
different  names. 

The  main  LP/QP  algorithm  developed  and  implemented  here  is  of  the  barrier  type. 
For  the  special  case  of  LP,  polynomiality  was  proved  by  Gonzaga  [Gon87]  and  by  Roos 
and  Vial  [RVSS].  Ip.  the  latter  case,  the  barrier  parameter  p  is  reduced  occasionally 
by  an  arbitrary  constant  factor  in  the  range  (0, 1).  The  algorithms  presented  in  this 
thesis  are  similar  in  spirit  to  this  approach. 

A  great  deal  of  background  and  unifying  theory  has  recently  been  given  by  Kojima 
et  ul.  [KMNY90]  on  interior-point  algorithms  for  linear  complementarity  problems, 
including  LP  and  convex  QP.  Their  results  apply  to  primal-dual  algorithms  such  as 
those  described  in  Chapter  6. 

Further  complexity  results  for  convex  QP  algorithms  have  been  given  by  Ye  [Ye87] 
and  by  Ye  and  Tse  [YT86]. 

While  proofs  of  polynomiality  are  of  considerable  theoretical  interest,  the  theoret¬ 
ical  bounds  that  have  been  obtained  are  enormously  greater  than  the  actual  numbers 
of  iterations  that  are  typically  achieved  in  practice.  If  this  were  not  the  case,  barrier 
methods  would  be  of  little  practical  value.  Some  steps  towards  bounding  the  expected 
number  of  iterations  have  been  taken  recently  by  Todd  et  al.  (TMY90),  but  again  for 
the  LP  case  only. 

1.7  Thesis  Contents 

Elsewhere,  the  principal  form  of  research  on  interior- point  methods  for  quadratic 
programming  has  been  to  investigate  their  complexity  properties  {for  example,  see 
Kojima  et  al.  (KMNY90J).  In  this  thesis  the  focus  is  on  defining  practical  algorithms 
and  facing  the  many  numerical  issues  that  arise  (c/.  [Mch$9,Meh90,LMSS9.LMS90, 
GLMS90]).  A  key  question  that  is  not  addressed  in  the  basic  theory  of  barrier  func¬ 
tions  is  how  to  determine  (or  approximate)  the  solution  of  the  subproblem.  Much  of 
the  thesis  is  directed  at  answering  this  question. 

Although  our  prime  interest  is  in  solving  large-scale  QP5s,  much  of  the  work 
presented  is  equally  applicable  to  large-scale  optimization  problems  whose  objective 
function  is  an  arbitrary  nonlinear  function  that  is  twice  continuously  differentiable. 

In  Chapter  2  we  state  the  necessary  and  sufficient  conditions  for  a  solution  of  a 
general  quadratic  program.  For  reference  purposes  we  present  several  ways  of  formu¬ 
lating  a  QP  and  introduce  the  concept  of  duality  within  the  quadratic  programming 
context.  We  briefly  describe  active-set  methods,  one  of  the  most  popular  class  of 
methods  for  quadratic  programming,  wc  motivate  the  interest  in  new  approaches, 
such  as  barrier  methods,  by  discussing  some  of  the  complications  that  arise  in  de¬ 
veloping  active-set  methods  for  large-scale  QP’s.  Specifically,  we  show  the  need  to 
circumvent  the  combinatorial  aspect  present  in  such  methods. 
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In  Chapter  3  we  review  the  fundamental  properties  of  barrier  functions  and  the 
classical  theory  of  barrier  methods  for  the  solution  of  a  general  nonlinear  problem 
with  inequality  contraints  (NIP).  Optimality  conditions  for  a  minimizer  of  NIP  are 
given,  and  we  discuss  the  relationship  of  minimizers  of  the  barrier  subproblems  to 
those  of  NIP.  A  model  barrier  algorithm  is  presented  for  the  solution  of  nonlinear 
programs  subject  to  a  mixture  of  linear  equality  and  general  inequality  const -aints. 

In  Chapter  4  we  discuss  algorithms  for  finding  or  approximating  minimizers  of 
the  barrier  subproblem  (the  minimization  of  a  nonlinear  function  subject  to  linear 
equality  constraints).  It  wiU  be  seen  that  for  the  large-scale  case,  such  algorithms 
are  both  rare  and  complex.  Optimality  conditions  for  the  solution  of  the  barrier 
subproblcm  are  given. 

Regardless  of  the  method  used  to  solve  the  subproblcm,  there  arc  some  special 
difficulties  not  present  in  the  usual  equality-constrained  optimization  problem.  For 
example,  with  a  norma!  problem  it  is  trivial  to  provide  an  initial  point  that  satisfies 
the  linear  equality  constraints,  but  here  the  point  must  also  be  strictly  interior  with 
respect  to  the  bounds.  We  propose  an  alternative  to  a  general-purpose  algorithm  that 
takes  specific  advantage  of  the  form  of  the  barrier  subproblcm. 

We  ;ire  also  interested  ir.  exactly  how  the  sequence  may  be  computed  efficiently.  A 
key  issue  in  algorithms  for  large-scale  problems  is  how  best  to  solve  the  linear  systems 
that  define  the  iterative  process.  A  feature  of  barrier  algorithms  for  LP  is  that  it  is 
jwssible  to  define  the  sequence  by  solving  symmetric  positive  definite  systems.  As  it 
happens,  algorithms  to  solve  such  systems  are  highly  developed.  In  the  QP  case  the 
systems  arc  symmetric  but  indefinite.  Such  systems  are  inherently  more  difficult  to 
solve  (in  part  because  numerical  stability  depends  upon  choosing  a  suitable  ordering 
of  the  equations).  We  describe  procedures  to  solve  such  systems. 

Our  interest  is  not  simply  in  defining  an  iterative  sequence  that  converges  to  a  QP 
solution  but  also  in  showing  that  when  the  sequence  is  computed  in  finite  precision 
it  converges  to  the  solution  of  a  neighboring  problem. 

To  prove  that  the  proposed  barrier  algorithm  is  numerically  viable,  we  must  also 
show  that  the  solution  to  the  subproblems  can  be  computed  in  a  stable  fashion.  To 
this  end.  Chapter  5  presents  a  detailed  sensitivity  analysis  of  the  KKT  system  arising 
from  barrier  methods.  A  key  result  of  this  analysis  is  that  the  ill-conditioning  normally 
associated  with  the  Hessian  in  barrier  methods  is  benign  when  the  logarithmic  barrier 
function  is  applied  to  a  particular  formulation  of  a  quadratic  program.  When  only 
simple  bounds  are  enforced  by  the  barrier  functions,  the  subproblems  can  be  solved 
to  the  required  accuracy  in  spite  of  the  severe  ill-conditioning.  Thus,  a  stable  barrier 
algorithm  is  realized. 

In  Chapter  6  we  study  some  special  cases  of  quadratic  programs  and  the  design 
of  practical  methods  that  take  advantage  of  the  special  properties  of  such  problems. 
Specifically  we  consider  the  following  special  cases:  when  the  only  constraints  present 
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are  simple  bound  constraints,  when  the  Hessian  is  trivially  invertible,  and  when  the 
reduced  Hessian  is  positive  semidefinite.  The  latter  case  exhibits  the  key  property 
that  the  minimizer  of  the  barrier  subproblem  is  unique.  For  such  QP’s  it  is  possible 
to  develop  primal,  dual  and  primal-dual  methods.  The  convergence  of  these  special 
methods  is  discussed.  In  the  linear  programming  context,  affine  methods  have  been 
proposed.  We  investigate  whether  similar  approaches  are  possible  for  solving  convex 
quadratic  programs. 

For  linear  programming  problems,  there  exists  an  input/output  format  that  has 
become  the  industry  standard  and  is  recognized  by  all  commercial  mathematical 
programming  systems — the  so-called  Mathematical  Program  System  (MPS)  format. 
Unfortunately,  there  is  no  equivalent  standard  format  for  quadratic  programming.  In 
Chapter  7  we  define  a  standard  for  specifying  a  QP  that  we  have  termed  the  QPS 
format.  The  basis  of  designing  the  QPS  format  has  been  to  adhere  as  closely  as 
possible  to  the  desirable  features  present  in  the  MPS  format.  A  feature  of  the  new 
format  is  that  it  is  simple  to  convert  an  MPS  file  into  a  QPS  file.  (A  program  to  do 
this  has  been  written  and  used  in  our  computational  experiments.) 

An  implementation  of  the  barrier  QP  algorithm  (termed  8ARQP)  has  been  de¬ 
veloped.  In  Chapter  8  the  details  of  the  implementation  are  described.  Barrier 
algorithms  are  sensitive  to  many  of  the  parameters  used  in  their  definition.  A  poor 
choice  of  such  parameters  can  lead  to  very  inefficient  algorithms.  Part  of  the  pur¬ 
pose  of  the  numerical  experimentation  has  been  to  determine  a  good  choice  for  these 
critical  parameters. 

In  order  to  extend  the  usefulness  of  the  implementation  to  as  wide  a  class  of  prob¬ 
lems  as  possible,  attention  is  given  to  a  number  of  numerical  issues.  For  example,  it 
may  be  that  the  constraint  matrix  is  ill-conditioned.  Inevitably  such  ill-conditioning 
is  reflected  in  the  KKT  system.  If  the  ill-conditioning  is  sufficiently  severe  the  algo¬ 
rithms  to  solve  such  systems  break  down.  We  show  how  to  modify  the  systems  to 
ensure  that  the  ill-conditioning  does  not  exceed  the  bound  for  which  the  algorithms 
break  down. 

In  contrast  to  the  LP  case,  there  does  not  exist  a  standard  set  of  QP  test  problems. 
Most  of  the  problems  v,e  have  used  in  our  tests  have  been  derived  from  a  well  known 
LP  test  set  by  changing  the  objective  function  to  be  quadratic.  We  have  also  tested 
our  algorithm  to  see  how  effective  it  is  at  solving  LP’s.  We  should  stress  that  the 
implementation  is  still  in  a  relatively  primitive  form.  It  has  not  been  our  purpose  to 
develop  an  implementation  suitable  for  general  distribution.  Results  are  presented 
and  a  comparison  mad''  with  the  performance  of  MINOS  (a  well  established  package 
for  the  solution  of  nonlinear  optimization  problems).  These  results  are  not  meant  as 
ar.  accurate  reflection  of  the  performance  of  a  barrier  QP  algorithm  but  are  intended 
to  serve  as  an  indication  of  the  algorithm’:-  potential. 


Chapter  2 

Fundamentals  of  Quadratic  Programming 


In  this  chapter  we  present  the  fundamental  properties  and  characteristics  of  the  gen¬ 
eral  quadratic  programming  (QP)  problem.  The  optimality  conditions  of  the  problem 
are  given,  and  additionally  we  motivate  some  of  the  issues  that  arise  in  developing 
practical  methods  for  solving  large-scale  quadratic  programs.  We  present  a  brief  de¬ 
scription  of  active-set  methods  for  dense  QP  problems  and  highlight  the  difficulties 
in  extending  these  algorithms  to  the  solution  of  large-scale  problems.  The  latter  is 
intended  to  explain  the  motivation  behind  new  approaches,  such  as  barrier  methods. 

2.1  Equality-Constrained  Quadratic  Programming 

We  first  consider  the  equality  constrained  quadratic  problem  (EQP).  This  is  the 
simplest  category  of  QP  problems,  yet  it  is  of  fundamental  importance  in  the  theory 
of  nonlinear  programming.  Problems  of  the  form  EQP  occur  as  subproblems  within 
many  active-set  methods  for  inequality  constrained  quadratic  programs  as  well  as  in 
sequential  quadratic  programming  methods  for  solving  general  nonlinear  programs. 

Formally,  the  EQP  problem  can  be  expressed  as 

EQP  minimize  w(x)  =  cTx  +  \xTHx 
^  rv  '  2  (2.1.1) 
subject  to  Ax  =  b, 

where  c  is  an  n-vector,  b  is  an  m-vector,  A  is  an  m  x  n  matrix  of  rank  m,  and  H  is 
an  n  x  n  symmetric  matrix.  The  gradient  of  <p  is  the  linear  function  g(x)  =  c  +  Hx, 
Note  that  H  is  the  Hessian  matrix  (of  second  partial  derivatives)  of  the  quadratic 
objective  function. 

When  a  set  of  m  linear  constraints  is  imposed  on  an  ?i-dimensional  minimization 
problem,  intuitively  we  expect  to  reduce  the  dimensionality  of  the  optimization  tn 
n  —  m.  Formally,  the  constraints  Ax  =  b  define  two  complementary  subspaces:  the 
m-dimensional  subspace  defined  by  the  rows  of  A  and  the  complementary  subspace 
of  vectors  orthogonal  to  the  rows  of  A.  Let  Y  denote  any  matrix  whose  columns  form 
a  basis  for  the  range  space  of  AT,  and  Z  denote  a  matrix  whose  columns  form  a  basis 
for  the  null  space  of  A ,  i.e.,  AZ  =  0.  We  emphasize  that  the  matrices  Y  and  Z  are 
not  unique. 
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Any  vector  x  can  be  written  as  a  linear  combination  of  the  columns  of  Y  and  Z, 
namely 

x  —  Yxy  +  Zxz 

for  some  m- vector  xY  and  an  (n  —  m)-vector  xz.  If  rc  is  a  feasible  point  of  EQP,  the 
vector  xY  is  the  solution  of  the  linear  system 


AYxy  =  b.  (2.1.2) 

When  A  has  full  row  rank,  the  matrix  AY  is  nonsingular  and  the  vector  xY  is  unique. 
Hence,  the  constraints  entirely  determine  the  range-space  portion  of  the  solution  x* 
of  EQP;  that  is,  xY  =  x* .  This  is  precisely  the  expected  reduction  in  dimensionality. 
It  is  convenient  both  conceptually  and  computationally  to  express  any  feasible  point 
x  of  EQP  as 

x  =  Y Xy  -f  Zxz, 

since  it  implies  that  solving  the  constrained  problem  EQP  can  be  viewed  as  minimizing 
the  following  unconstrained  problem  in  the  variables  xz: 

minimize  <^(x2)  =  (c  +  HYx*y)tZxz  +  \xr^ZTHZ)xz.  (2.1.3) 

The  matrix  ZTHZ  is  known  as  the  reduced  Hessian.  The  following  lemma,  which 
summarizes  the  characterization  of  a  minimizer  of  EQP  for  the  case  when  A  has  full 
row  rank,  follows  immediately  from  the  optimality  conditions  of  problem  (2.1.3). 


LEMMA  2.1.1.  Let  A  be  an  m  x  n  matrix  of  rank  m,  let  Z  denote  a  basis  for  the 
null  space  of  A,  and  let  x  =  Yx*y  +  Zxz,  where  AFx*  =  b  and  Y  is  a  basis  for  the 
range  space  of  Ar.  Then 

(i)  EQP  has  a  strong  local  minimizer  at  x*  if  and  only  if  x*z  is  a  stationary  point 
of  <f>(xz)  and  ZTHZ  is  positive  definite; 

(ii)  EQP  has  an  infinite  number  of  weak  solutions  if  and  only  if  there  is  a  stationary 
point  of  (p(xz)  and  ZTHZ  is  positive  semidefinite  and  singular; 

(iii)  EQP  has  no  solution  if  no  stationary  point  of  <p(xz)  exists. 

Note  that  when  the  local  minimizer  of  EQP  exists,  it  is  also  a  global  minimizer. 

If  A  is  not  of  full  row  rank,  in  addition  to  the  above  conditions  it  is  necessary 
for  the  vector  b  to  lie  in  the  column  space  of  A.  In  theory  we  can  always  remove 
dependent  constraints  such  that  the  resulting  A  has  full  row  rank;  however  there  may 
be  numerical  difficulties  in  recognizing  this  situation.  In  general  the  cost  of  achieving 
full  rank  is  not  negligible. 
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The  first-order  necessary  conditions  .<,at&  .hat  *  soluti<*".  ^  of  EQ  *  must  f^tisfy 
=  ZTVip{x*)  ~  ZTg{xf )  =  ZT(c  +  six*)  ~  0,  (2.1.4) 

or  equivalently,  there  exists  ir*  such  that  Ax*  =  b  and 

u{x*)~c  +  Hx*  =  Attt*  (2.1.5) 

fo>*  some  vector  of  Lagrange  multipliers  7 r*.  Thus,  x*  and  iC  satisfy  the  equation 


(hat 

0 


(2.1.6) 


The  matrix 


K  = 


H  AT\ 

A  0  J 


is  referred  to  as  the  Karush-Kuhn-Tucker  (KKT)  matrix,  and  (2.1.6)  is  the  KKT 
system  of  equations.  In  Chapter  3  we  discuss  the  KKT  system  in  the  context  of  the 
barrier  method.  The  above  system  can  also  be  derived  by  considering  the  Lagrangian 
function  associated  with  the  EQP  problem,  namely 


L(x,v)  =  cTx  +  ~xtHx  -  7rT(Ax  -  b). 

It  can  be  easily  verified  that  if  z"  and  n*  satisfy  (2.1.6),  then  (.T*,7r*)  is  a  stationary 
point  of  the  Lagrangian  function. 

Given  any  point  (x,7r),  let  (p, q)  be  the  step  to  the  solution  (ar*,7r*),  so  that 
x*  =  x  +  p  and  it*  =  z  -f  q.  It  is  convenient  to  rewrite  equations  (2.1.6)  in  terms  of  p 
and  q: 

*(1) -(';)■  (2-ij: 

whare  gL  =  c  +  Hx  -  ATir  is  the  gradient  of  the  Lagrangian  with  respect  to  x,  anc 
r  —  Ax  —  b  is  the  res  dual  for  the  constraints. 

We  can  also  characterize  a  minimizer  of  EQP  :n  terms  of  the  inertia  of  the  KKT 
matrix. 

DEFINITION  2.1.1.  The  inertia  of  a  symmetric  matrix  K  is  the  triplet 


In{I<)  =  (i+,i_,io), 

where  i+,i-  and  zo  arc  the  number  of  positive,  negative  end  zero  eigenvalues  of  K 
respectively , 
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The  following  lemma  gives  an  important  relationship  between  the  KKT  matrix 
and  the  reduce  ’  tTo‘:<qun  ZTHZ. 

Lemma  2.1.2.  Let  H  be  an  n  x  n  sym'  teiric  matrix  and  A  an  m  x  n  matrix  of 
full  row  rank.  If  Z  is  a  basis  for  th',  null  space  of  A,  then 

In(K)  =  In  (  H  A  )  =  /n(ZTUH)  +  (m,m,0).  (2  1.8) 

\A  0  ) 

Proof.  For  a  proof,  see  [Gou85]. 

Notice  that  the  KKT  maLir;  is  nonsir^Ta'  if  and  only  if  the  reduced  Hessian  is 
nonsinguiar.  Using  (2.1.8),  Lei  ima  2-1  .j.  can  also  he  stated  as  follows. 

LEMMA  2.1.3.  Let  A  bp  m  x  n  matrix  of  rank  m,  let  Z  denote  a  baeis  for  the 
null  space  of  A,  and  x  =  Yx^  +  Zx2)  where  AYx*.  ~  b  and  Y  is  a  basis  for  the 
range  space  of  At.  Then 

(i)  EQP  has  a  strong  local  minimizer  at  x*  if  x*  is  a  stationary  point  of  EQP  und 
In(K)  s=  (n,  m,0); 

(ii)  EQP  has  an  infinite  number  of  weak  solutions  if  there  is  a  stationary  point  of 
EQP  and  In{Ii )  =  (n  —  t,m,  t),  with  t  >  0; 

(iii)  EQP  has  no  solution  if  no  stationary  point  of  EQP  exists. 

A  detailed  discussion  of  the  conditions  for  the  existence  and  uniqueness  of  solut  ions 
of  the  EQP  problem  can  be  found  in  (Gou85|. 

2.2  Inequality-Constrained  Quadratic  Programming 

The  general  quadratic  programming  problem  is  to  find  a  local  minimizer  of  a  quadratic 
function  subject  to  linear  constraints.  There  are  many  mathematically  equivalent 
formulations  of  a  quadratic  program  (although  not  all  have  the  same  computationa' 
behavior).  We  first  consider  the  problem  in  the  following  pure  inequality  form: 

IQP  minimize  ip(x)  =  cTx  +  \xTHx 

x€»n  ’  2  (2.2.1) 

subject  to  Ax  >  /?, 

where  t,he  Hessian  H  is  an  n  x  n  symmetric  matrix,  A  is  an  m  x  n  matrix,  and  c  and 
/?  are  vectors  of  dimension  n  and  m  respectively.  An  important  special  case  of  the 
above  problem  occurs  when  H  is  positive  definite. 
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We  give  the  first-order  and  second-order  necessary  conditions  for  a  solution  of 
IQP.  In  order  to  do  so,  it  is  important  to  distinguish  between  constraints  that  are 
satisfied  exactly  at  the  solution,  and  those  that  are  not  binding. 

DEFINITION  2.2.1.  The  point  x  is  said  to  be  feasible  with  respect  to  the  inequality 
constraint  ajx  >  /?;  if  ajx  >  (3,.  ( That  is,  the  constraint  is  satisfied  at  £.)  The 
constraint  ajx  >  /?,•  is  said  to  be  active  at  x  if  ajx  —  /?,-  and  inactive  if  ujx  >/?,'.  Ij 
ajx  <  fa,  x  is  infeasible,  and  the  constraint  is  said  to  be  violated  at  x. 

The  active  "^nstrs'-.ts  play  a  special  role  because  they  restrict  feasible  perturba¬ 
tions.  If  a  constrai  it  *  inactive  at  the  point  x,  then  it  will  remain  inactive  for  any 
perturbation  \  r  miciently  small  neighborhood.  However,  an  active  constraint  may 
be  violated  by  certain  perturbations. 

Lemma  2.2.1.  ( Necessary  conditions.)  Let  A  be  the  t  x  n  matrix  containing  only 
those  rows  of  A  for  which  the  corresponding  constraint  is  active  at  the  point  x* ,  let  b 
be  the  corresponding  elements  of  (3,  and  let  Z  be  a  basis  for  the  null  space  of  A.  The 
point  x*  is  a  minimizer  of  IQP  only  if  there  exists  a  vector  tt*  such  that 

(i)  ATn*  —  Hx  -b  c,  or  equivauntly  ZT[Hx  -f  c)  =  0; 

(ii)  Ax*  >  /3;  Ax *  =  b; 

(iii)  7T*  >  0,  i  —  1,.  and 

(iv)  ZtHZ  is  positive  semidefinite. 

The  conditions  (i),  (ji)  and  (iii)  are  referred  to  as  the  first-order  necessary  condi¬ 
tions  and  (iv)  is  a  second-order  necessary  condition.  In  contrast  to  the  EQP  case,  the 
optimality  conditions  for  IQP  restrict  the  sign  of  the  Lagrange  multipliers  associated 
with  the  active  constraints  at  the  solution. 

LEMMA  2.2.2.  Sufficient  conditions  for  x*  to  be  a  solution  of  IQP  are  given  by 

a)  AJr*  =  Hx *  4-  r.  nr  equivalently  ZT(Hx*  +  c)  =  0; 

b)  Ax*  >  / 3 ;  Ax*  =  b; 

c)  7T*  >  0,  i  =  1,. . .  ,t;  and 

dj  ZTIIZ  is  positive  definite. 


so 
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Under  such  circumstances,  x*  is  an  isolated  minimizes 

An  extra  complication  over  the  equality-constrained  case  arises  when  the  Lagrange 
multiplier  corresponding  to  an  active  constraint  is  zero.  In  this  case,  any  set  of 
sufficient  conditions  must  include  certain  restrictions  to  account  for  perturbations 
that  are  binding  for  all  the  constraints  with  positive  Lagrange  multipliers  but  that 
may  be  binding  or  non-binding  for  constraints  with  zero  Lagrange  multipliers. 

When  the  sufficient  conditions  hold,  x*  is  not  only  optimal  but  is  also  locally 
unique ,  i.e.,  <p(x*)  <  <p(x)  for  all  feasible  x  in  a  neighborhood  of  x*  ( x *  ^  x).  The 
gap  between  the  necessary  conditions  and  the  sufficient  conditions  arises  from  the 
possibility  of  one  or  more  zero  Lagrange  multipliers  and/or  a  positive  definite  and 
singular  reduced  Hessian.  Sets  of  conditions  that  are  simultaneously  necessary  and 
sufficient  are  surprisingly  complicated  and  they  can  seldom  be  verified  in  practice. 

When  the  necessary  conditions  are  satisfied  but  the  sufficient  conditions  are  not, 
x*  may  or  may  not  be  a  local  solution  of  (2.2.1) — for  example,  a  feasible  direction 
of  decrease  may  exist.  Verification  of  optimality  in  such  instances  requires  further 
information,  and  is  in  general  an  NP-hard  problem  (see  Murty  and  Kabadi  [MK87] 
and  Pardalos  and  Schnitger  [PS88])  that  is  equivalent  to  the  copositivity  problem  of 
quadratic  programming  (see,  e.g.,  Contesse  [ConSO]  and  Majthay  [MajTlj). 

2.3  Convex  and  Nonconvex  Quadratic  Programming 

Quadratic  programs  can  be  grouped  into  two  broad  categories:  convex  and  nonconvex 
QP.  The  former  category  exhibits  some  interesting  properties;  for  example,  a  local 
minimizer  of  a  convex  QP  is  also  a  global  minimizer.  This  section  is  intended  to  point 
out  briefly  why  computing  the  solution  of  a  nonconvex  quadratic  program  constitutes 
a  much  more  difficult  task  than  that  of  solving  a  convex  quadratic  program. 

Consider  an  iterative  procedure  for  computing  the  solution  of  a  quadratic  program. 
Let  ZfHZk  be  the  reduced  Hessian  matrix  at  the  point  When  ZfHZk  is  positive 
definite,  the  search  direction  pk  at  x*  can  be  obtained  by  solving  the  Newton  equations 

ZjHZkPz  =  -Zjft  (2.3.1) 

and  setting  pk  =  Zpz.  It  can  be  verified  that  pk  is  a  descent  dheciion  for  the  quadratic 
function.  Furthermore,  the  minimizer  in  the  subspace  defined  by  Zk  is  unique. 

A  strictly  convex  quadratic  program  is  one  for  which  H  is  positive  definite.  In 
this  case,  ZjHZk  is  known  to  be  positive  definite  at.  every  iterate  xjt,  and  hence  p* 
is  always  well  defined.  For  convex  QP’s,  ZjHZk  might  be  only  positive  semidefinite. 
However,  every  minimizer  is  a  global  minimizer. 

In  noncon%fex  quadratic  programs,  complications  arise  when  the  reduced  Hessian 
ZjHZk  is  indefinite.  In  this  case,  a  point  satisfying  the  first-order  necessary  conditions 
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for  optimality  is  not  a  local  minimizer  in  the  current  null  space.  Indeed  there  may  be 
no  point  on  a  given  set  of  constraints  that  satisfies  the  first-order  necessary  conditions. 
Also  the  direction  defined  by  (2.3.1)  is  not  necessarily  a  descent  direction  for  the 
quadratic  function.  Hence,  a  mechanism  for  computing  a  feasible  direction  of  descent 
must  be  provided.  In  terms  of  efficiency,  it  is  desirable  for  such  a  mechanism  to 
retain  as  much  information  about  the  present  (indefinite)  Hessian  as  possible.  Hence, 
standard  approaches  for  modifying  an  indefinite  matrix  may  not  be  suitable.  Care 
must  be  exercised  in  updating  a  factorization  of  an  indefinite  matrix,  since  there  is 
a  danger  of  numerical  instability  (see  Gill  et  al.  [GMSW84c]  where  the  Hessian  is 
allowed  to  have  one  negative  eigenvalue). 

Finally,  with  nonconvex  quadratic  programs  there  may  exist  so-called  dead  points 
at  which  it  is  very  difficult  to  verify  optimality.  At  such  points,  ail  conventional 
quadratic  programming  methods  will  find  it  difficult  to  proceed,  since  it  can  be  shown 
that  the  problem  of  distinguishing  a  dead  point  that  is  not  a  minimizer  is  an  NP-hard 
problem  (see  Forsgren,  Gill  and  Murray  [FGM89b]  for  a  precise  definition  of  a  dead 
point  and  a  computational  scheme  within  the  context  of  inertia-controlling  methods 
for  QP  that  will  attempt  to  determine  if  a  dead  point  is  a  local  minimizer).  We 
emphasize  that  this  difficulty  is  inherent  in  the  problem,  and  is  independent  of  the 
solution  method. 


2.4  Formulating  a  QP  Problem 

The  quadratic  programming  problem  can  be  expressed  in  several  mathematically 
equivalent  forms.  For  reference  purposes,  we  introduce  in  this  section  several  formu¬ 
lations. 

The  General  Form. 


minimize  cFx  +  \xTHx 

rgJi"  2 

subject  to  Ai*  =  b\  ^  ^  j  j 

A2X  ^  62 

i  <  X  <  u, 

where  A\  and  A2  are  of  dimension  mi  x  n  and  m2  x  a  respectively.  Constraints 
Atx  =  bx  and  A2r  >  b2  are  called  the  general  constraints.  Thus,  formulation  (2.4.1) 
includes  a  mixture  of  general  equality  and  inequality  constraints  as  well  as  simple 
upper  and  lower  bounds  on  the  variables. 
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The  Standard  Form. 

cTx  +  \xtHx 

Ax  =  b  (2-4.2) 

£  <  x  <  u, 

where  all  the  general  constraints  are  equalities,  and  the  only  inequalities  are  upper 
and  lower  bounds  on  the  variables.  For  expository  purposes  we  shall  usually  omit  the 
upper  bounds  and  take  £  —  0. 

Any  quadratic  programming  problem  can  be  converted  to  standard  form.  It  is 
sometimes  convenient  to  transform  a  given  QP  problem  into  this  form.  For  instance, 
for  large-scale  quadratic  programs,  it  can  be  algorithmically  advantageous  to  assume 
that  the  constraints  are  posed  as  in  (2.4.2)  (see,  for  example,  Gill  et  al.  [GMSW87, 
GMSWS8]).  Indeed  it  will  be  seen  later  that  rather  than  being  convenient,  it  is 
essential  to  the  success  of  the  barrier  methods  described  here  to  convert  the  problem 
to  standard  form. 

A  QP  problem  stated  in  the  form  (2.4.1)  can  be  converted  into  the  standard  form 
by  introducing  slack  variables  or  by  using  duality.  We  shall  illustrate  the  latter  in 
Section  2.5. 

In  particular,  a  general  inequality  constraint  afx  >  /?,-  can  be  replaced  by  the 
equality  constraint  ajx  —  st  =  fa,  and  the  standard-form  version  of  the  problem 
includes  an  additional  slack  variable  subject  to  the  bound  s,  >  0.  Slack  variables  have 
many  special  features;  for  instance,  they  do  not  appear  in  the  objective  function.  By 
adding  m2  slack  variables  $,  problem  (2.4.1)  can  be  transformed  into  standard  form 
as  follows: 

minimize  cTx  +  \xTHx 

z€*n,  2 

subject  to  A\x  =  b\ 

A2x  —  s  =  b?  (2.4.3) 

£  <  x  <  u 
0  <  s  <  oo. 

2.5  Duality  in  Quadratic  Programming 

The  optimality  conditions  for  linear  and  quadratic  programming  problems  involve 
not  only  the  variables  x  associated  with  each  column  of  the  constraint  matrix  but 
also  Lagrange  multipliers  associated  with  each  row.  An  interesting  theory  ( duality 
thco ■  has  been  developed  to  explore  the  relationships  between  these  two  sets  of 
variuoles.  Traditionally,  the  variables  x  of  the  “original”  problem  are  called  the  primal 
variables,  and  the  original  problem  is  denoted  the  primal.  The  Lagrange  multipliers 
are  sometimes  referred  to  as  dual  variables. 


minimize 

z€K" 

subject  to 
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The  appearance  of  duality  theory  in  linear  programming  goes  back  to  the  classical 
minimax  theorem  of  von  Neumann  and  was  first  explicitly  given  by  Gale,  Kuhn  and 
Tucker  (see  [GKT51]).  The  concept  of  duality  also  applies  to  quadratic  programs. 
Duality  in  nonlinear  programming  started  with  the  results  on  quadratic  programming 
given  by  Dennis  [Den57,Den59]. 

We  shall  state  some  of  the  relationships  between  certain  primal  and  dual  QP 
programs.  In  what  follows  we  assume  that  H  is  positive  definite.  Duality  results 
can  be  extended  to  more  general  H  (see  Murray  [Mur69]),  but  the  conditions  are 
somewhat  complicated  and  difficult  to  verify  in  practice. 


Primal  QP  (standard  form): 


min  L  =  cTx  +  \xTHx 

subject  to  Ax  =  b  (2.5.1) 

x  >  0. 


Dual  QP: 


max 


fd  -  bTy  -  \wTHw 


subject  to 


ATy  -  Hw  <  c. 


(2.5.2) 


The  dual  (2.5.2)  can  itself  be  converted  to  standard  form  by  adding  n  nonnegative 
slack  variables  2,  thereby  giving  the  following  standard-form  dual: 


max 

y€»m,  «/,*€»" 


fd  =  bTy  -  \wrHw 


subject  to 


ATy  —  Hw  +  z  =  c,  2  >  0. 


(2.5.3) 


If  the  original  problem  is  in  pure  inequality  form  we  may  also  give  a  dual  form: 


Primal  QP  (inequality  contraints): 


Dual  QP: 


min  L  =  cTx  +  \xTHx 
xe«n  2 

subject  to  Ax  >  P- 


min 

V€Km,  *€Kn 

subject  to 


fd  =  PTy  -  \wTIIw 
ATy  —  Hw  =  c,  y  >  0. 


(2.5.4) 


(2.5.5) 
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The  standard  and  inequality  forms  are  included  in  the  following  general  form: 


Primal  QP: 

min  fp  =  cTx  +  \xTHx  +  aTy  +  \yTGy 

subject  to  Ax  +  Dy  >  b 

Bx  +  Ey  =  d 

x>0. 

Dual  QP: 

max  fd  =  bTu  +  dTv  —  \wTHw  —  hjTGy 

u,vtw,y  1  1 

subject  to  Atu  +  Btv  <  c  +  Hw 

Dtu  +  Erv  =  a  +  Gy 
u>  0. 


The  key  relationship  between  the  primal  and  dual  is  that  from  the  solution  of  one 
problem  we  may  recover  the  solution  of  the  other.  For  example,  in  the  case  of  the 
primal  in  standard  form  let  x*  denote  the  minimizer,  it*  the  Lagrange  multipliers  of 
the  equality  constraints,  and  z*  the  Lagrange  multipliers  of  the  bound  constraints.  It 
follows  that  the  solution  of  the  dual  problem  is  given  by  y  =  tt*,  w  =  x*  and  z  —  z*. 
Obviously  if  the  dual  problem  were  solved  we  could  recover  the  solution  of  the  primal 
problem. 

The  relationship  between  the  two  solutions  may  be  verified  by  observing  that  the 
necessary  conditions  for  a  solution  of  the  two  problems  reduce  to  the  same  system  of 
equations.  Another  interesting  relationship  between  the  primal  and  dual  problems  is 
that  for  any  primal  feasible  point  and  any  dual  feasible  point  we  have 

fv~h>  0. 

To  show  why  this  result  is  true  we  prove  it  when  the  primal  is  in  standard  form.  If  x 
is  primal  feasible  then 

Ax  =  b,  x  >  0.  (2.5.6) 

If  ( w,y,z )  is  dual  feasible  then 

ATy  -  Hw  +  z  =  c.  (2.5.7) 

Premultiplying  (2.5.7)  by  xT  and  substituting  b  for  Ax  we  obtain 


cfx  =  xTATy  —  xtHw  +  xTz 
=  bTy  —  xtHw  +  xTz. 
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Since 

( x  —  w)tH(x  —  w)  =  xtHx  —  2  x*Hw  +  wtHw, 

we  get 

cTx  +  | xtHx  =  bTy  —  \wtHxv  +  \{x  —  w)TH(x  —  w)  +  xTz. 

Hence, 

fP(x)  ~  fd{w,  y,  z)  =  xTz  +  f  (x  -  w)TH(x  -w)>  0.  (2.5.8) 

If  a  primal  feasible  point  and  a  dual  feasible  point  were  known  then  an  upper  and 
lower  bound  on  the  solution  would  also  be  known.  The  difference  fp  —  fj  is  referred 
to  as  the  duality  gap.  It  is  easily  seen  from  the  above  result  that 

fP(x*)  ~  fd{x*,y*,z*)  =  0.  (2.5.9) 


2.6  Active-set  Methods 

We  briefly  describe  one  of  the  most  popular  classes  of  methods  for  quadratic  program¬ 
ming,  namely  active-set  methods.  Our  primary  purpose  is  to  illustrate  the  difficulties 
that  arise  in  applying  such  methods  to  large-scale  problems,  thus  motivating  the 
barrier  approach.  It  will  also  give  us  a  basis  for  comparing  the  efficiency  of  barrier 
methods. 

If  the  active  set  at  the  solution  were  known  a  priori ,  the  solution  to  the  IQP 
problem  (2.2.1)  could  be  determined  by  solving  a  single  I\KT  system  of  equations. 
Active-set  methods  are  iterative  methods  that  maintain  an  estimate  of  the  set  of 
constraints  active  at  the  solution — called  the  working  set — which  is  a  linearly  inde¬ 
pendent  set  of  constraints.  In  the  active-set  methods  described  in  this  section,  it  is 
assumed  that  at  a  particular  iterate  x,  all  constraints  in  the  working-set  are  active. 

We  shall  illustrate  the  steps  of  a  QP  method  for  a  primal-feasible  active-set  method. 
Each  iteration  has  the  following  general  structure:  given  the  current  iterate  .r,  the 
next  iterate  is  defined  by 

x  =  x  +  ap, 

where  the  vector  p  is  the  search  direction,  and  the  nonnegative  scalar  a  is  the 
steplengtli.  An  initial  feasibility  phase  is  performed  to  find  a  point  that  satisfies  the 
constraints  of  (2.2.1),  and  all  iterates  are  thereafter  constructed  to  retain  feasibility. 
Thus,  if  Aw  denotes  the  working-set  matrix  and  bw  the  associated  right-hand  side 
vector,  we  have  Awx  =  bw. 

The  search  direction  p  is  defined  as  the  solution  of  the  following  equality- const  rained 

QP: 

minimize  gTp  +  lpTIIp 
Pe»"  * 

subject  to  Awp  =  0, 
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where  g  denotes  g(x),  the  gradient  of  ip  at  the  current  iterate.  The  constraints 
Awp  =  0  ensure  that  constraints  in  the  working  set  remain  unaltered  by  any  move 
along  p. 

The  following  (standard)  terminology  is  useful  in  characterizing  the  relationship 
between  p  and  p: 


p  is  a 


{ 


descent  direction  if  gTp  <  0; 

direction  of  negative  curvature  if  pTHp  <  0. 


Because  p  is  quadratic, 


p{x  +  ap )  =  p(x)  +  agTp  +  \a2pTlip.  (2.6.1) 

This  relation  shows  that  every  direction  p  for  which  p  decreases  must  be  either  a 
descent  direction,  or  a  direction  of  negative  curvature  with  gTp  —  0.  If  gTp  <  0  and 
pTHp  >  0,  we  see  from  (2.6.1)  that  p(x  +  ap)  <  p{x)  for  all  0  <  a  <  r,  where 
r  =  —2gTp/pTHp.  If  grp  <  0  and  pTHp  <  0,  or  if  gTp  =  0  and  pTIIp  <  0,  (2.6.1) 
shows  that  p  is  monotonically  decreasing  along  p,  i.e.,  p(x  +  ap)  <  p[x)  for  all  a  >  0. 

If  the  reduced  Hessian  with  respect  to  the  working  set  is  positive  definite,  a  min- 
imizer  on  the  subspace  defined  by  these  constraints  can  be  found.  The  step  p  to  the 
ininimizer  on  the  current  working  set  satisfies  the  equation 


where  zw  contains  the  Lagrange  multipliers  corresponding  to  the  constraints  in  the 
working  set.  The  minimizcr  on  the  subspace  is  then  found  as  x  +  p.  However, 
since  there  are  other  constraints  present  in  the  problem,  the  point  x  +  p  may  not  be 
feasible.  In  this  situation,  the  maximum  feasible  step  o  =  amax  along  p  is  computed. 
If  Qmax  <  1,  the  constraint  that  becomes  active  at  the  new  iterate  is  added  to  the 
working  set.  If  the  unit  step  along  p  is  accepted,  a  minimizer  on  the  subspace  has 
been  found.  The  Lagrange  multiplier  vector  is  then  examined  to  see  if  -w  >  0.  If 
not,  a  constraint  corresponding  to  a  negative  multiplier  is  deleted  from  the  working 
set. 

An  active-set  method  of  this  kind  usually  deletes  and  adds  only  one  constraint  at 
a  time.  Such  a  method  allows  use  uf  computationally  efficient  updating  schemes  for 
the  matrix  factorizations.  Strategies  that  add  and  delete  more  than  one  constraint 
at  a  time  are  possible. 

One  class  of  active-set  methods  is  the  inertia-controlling  methods  for  quadratic  pro¬ 
gramming  (ICQP  methods);  for  example  see  [FIe71,GMSW84c,GMSWSStGouS6a]. 
Such  methods  control  the  working  set  so  that  the  associated  reduced  Hessian  has 
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at  most  one  nonpositive  eigenvalue.  (For  a  full  description  of  ICQP  methods  see 
(GMSWSSj).  Usually  only  one  constraint  is  added  or  deleted  at  a  time.  The  working 
set  at  the  initial  point  must  be  chosen  so  that  the  reduced  Hessian  is  positive  definite. 
By  appending  certain  artificial  or  temporary  constraints  to  the  problem  this  condi¬ 
tion  can  be  satisfied  at  an  arbitrary  point.  A  necessary  requirement  for  an  artificial 
constraint  is  linear  independence  relative  io  constraints  already  in  the  working  set. 
Artificial  constraints  do  not  restrict  the  feasible  region,  since  they  may  be  eliminated 
from  the  problem  whenever  the  algorithm  permits. 

For  the  solution  of  moderate-sized  quadratic  programming  problems.  ICQF  meth¬ 
ods  have  proved  to  be  very  efficient.  This  is  due  in  part  to  the  use  of  sophisticated 
techniques  for  updating  factorizations  of  dense  matrices,  and  to  the  fact  that  the 
factorizations  needed  to  compute  a  direction  of  descent  may  also  be  used  to  compute 
a  direction  of  negative  curvature  when  such  a  direction  exists. 

Unfortunately,  the  efficient  application  of  active-set  methods  to  large-scale  prob¬ 
lems  is  non-trivial.  Only  two  large-scale  ICQP  methods  have  been  proposed  and  both 
are  based  on  a  direct  factorization  of  the  KKT  system.  Gould  [GouS6a]  has  proj>osed 
a  method  based  on  the  use  of  an  LU  factorization  together  with  procedures  to  update 
the  LU  factors.  Gill  et  al.  [GMSWS7]  have  proposed  a  method  based  on  an  LBLT 
factorization,  which  is  for  symmetric  indefinite  matrices  (see  Chapter  4  for  a  defini¬ 
tion  of  this  factorization).  The  updates  are  not  performed  on  the  factorization  itself. 
The  required  KKT  solution  may  be  recovered  by  solving  a  system  that  augments 
the  original  KKT  system.  The  solution  of  the  augmented  system  is  found  by  using 
the  Schur  complement  of  H  in  the  original  KKT  matrix.  (The  Schur  complement  is 
defined  in  Equation  (4.11.1).)  A  benefit  of  this  approach  is  that  a  Kblack-box"  LBLT 
package  may  be  used,  which  would  not  be  possible  if  it  were  necessary  to  update  the 
LBLt  factorization. 

A  key  difference  in  solving  large-scale  problems  is  that  updating  results  in  an 
increasingly  large  data  file  (whereas  in  the  dense  case  the  elements  of  the  factors  are 
updated  explicitly).  Eventually  the  data  file  exceeds  available  memory  or  becomes 
large  enough  that  it  is  worthwhile  to  discard  it  and  restart  with  a  new  refactorization 
of  the  current  KKT  matrix. 

Another  difficulty  in  the  large-scale  case  is  the  provision  of  artificial  constraints.  In 
the  dense  case  a  sophisticated  procedure  is  known  for  generating  artificial  constraints 
that  are  in  some  sense  ideal  (GMSWSla,  Section  5|-  For  example,  they  do  not  ex¬ 
acerbate  the  condition  of  the  KKT  system.  Unfortunately,  this  procedure  requires 
computing  an  orthogonal  basis  for  the  null  space  of  A,  which  is  not  practical  in  the 
large-scale  case.  The  only  procedure  that  can  be  guaranli’cd  to  generate  a  suitable 
starting  point  in  the  large-scale  case  may  generate  a  near  vertex  even  when  the  re¬ 
duced  Hessian  at  the  initial  point  has  only  a  few  negative  eigenvalues.  We  may  then 
require  many  iterations  simply  to  eliminate  the  artificial  constraints.  Such  difficulties 
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are  not  intractable  in  that  special  iterations  may  be  performed  to  try  and  eliminate 
many  constraints  at  a  single  iteration.  However,  the  special  iterations  are  themselves 
problematical. 

A  further  difficulty  with  an  active-set  strategy  is  that  the  number  of  iterations 
usually  grows  with  the  size  of  the  problem.  However,  it  is  known  that  the  number  of 
iteration.;  can  grow  exponentially  with  the  number  of  variables  (see  [Fat78]).  Thus, 
in  the  application  of  active-set  methods  to  large-scale  quadratic  problems  the  number 
of  iterations  is  likely  to  be  large  and  potentially  may  be  astronomical. 

It  will  be  seen  that  none  of  the  above  difficulties  arises  in  the  barrier  algorithms. 
However,  such  methods  do  have  some  unique  complications  of  their  own.  It  is  then  a 
question  of  which  difficulties  are  the  more  serious  when  solving  a  specific  problem. 


Chapter  3 

Barrier  Methods 


In  the  previous  chapter  we  considered  some  of  the  difficulties  in  applying  conventional 
methods  to  large-scale  problems.  Specifically,  we  emphasized  the  combinatorial  as¬ 
pect  introduced  by  active-set  methods.  In  this  chapter  we  present  the  fundamental 
properties  of  barrier  functions  and  give  a  model  barrier  algorithm  for  the  solution 
of  quadratic  programs.  The  observed  performance  of  barrier  methods  is  that  the 
required  number  of  iterations  to  achieve  some  specified  approximation  to  the  solu¬ 
tion  is  largely  independent  of  the  number  of  constraints  and  the  number  of  variables. 
Although  the  work  per  iteration  may  be  considerably  greater  than  in  active-set  meth¬ 
ods,  the  hope  is  that  the  number  of  iterations  is  so  small  that  the  total  effort  required 
by  the  barrier  approach  will  be  less. 

3.1  Optimality  Conditions 

Consider  a  problem  in  which  all  the  constraints  are  assumed  to  be  inequalities: 

NIP  minimize  F(x) 

(3.1.1) 

subject  to  c(x)  >  0, 

where  c(.t)  is  an  m-vector  of  nonlinear  functions  with  z-th  component  c,(x),  i  = 
l,...,m,  and  F  and  {c*}  are  twice-continuously  differentiable.  Let  g(x)  denote  the 
gradient  vector  of  F(x),  afx)  the  gradient  vector  of  c,(.t),  and  ^4(x)  the  m  x  n 
Jacobian  matrix  of  c(x).  A  solution  of  NIP  will  be  denoted  by  x*. 

Let  c(x)  denote  the  vector  of  constraints  that  are  active  at  x,  and  A(x)  the  Ja¬ 
cobian  of  c.  We  emphasize  that  A(x)  includes  only  the  gradients  of  the  active  con¬ 
straints,  whereas  „4(x)  is  the  Jacobian  of  all  the  constraints. 

The  derivation  of  optimality  conditions  for  NIP  requires  that  the  constraint  func¬ 
tions  satisfy  a  constraint  qualification  at  x* .  We  will  assume,  that  /l(x*)  has  full 
row  rank,  which  implies  that  the  constraint  qualification  holds  at  x* .  Necessary  and 
sufficient  conditions  for  a  feasible  point  x*  to  be  a  local  minimizer  of  NIP  arc  given 
in  the  following  theorems  (see,  e.g.,  Gill  et.  al.  [GMW81]): 

THEOREM  3.1.1.  (First-order  necessary  optimality  conditions.)  If  A(x*)  has  full 
row  rank ,  a  necessary  condition  for  a  feasible  point  x*  to  be  a  minimizer  of  NIP  is 
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that  there  exist  a  vector  A*  such  that 


g(x*)  =  with  A*  >  0. 


(3.1.2) 


THEOREM  3.1.2.  (Sufficient  optimality  conditions.)  Assuming  A(x*)  has  full  row 
rank  and  Z(x*)  is  a  basis  for  the  null  space  of  A(x*),  a  feasible  point  x*  is  a  strong 
local  minimizer  of  NIP  if  there  exists  a  vector  A*  such  that 

(i)  g(x*)  =  AT(x*)\* ; 

(ii)  A*  >  0; 

(iii)  Z(x*)TV2L(x* ,  A*)Z(.t*)  is  positive  definite. 


Condition  (ii)  of  Theorem  3.1.2 — that  the  Lagrange  multipliers  corresponding  to 
active  constraints  are  strictly  positive — is  usually  termed  strict  complementarity.  If 
any  multiplier  corresponding  to  an  active  constraint  is  zero,  the  optimality  conditions 
become  more  complicated.  (For  details,  see,  e.g.,  Fiacco  and  McCormick  [FM68].) 

3.2  Barrier  Functions 

Barrier-function  methods  constitute  a  class  of  sequential  minimization  methods  for 
solving  the  inequality  constrained  problem  NIP  (3.1.1),  where  the  feasible  region  has 
a  nonempty  interior.  They  are  characterized  by  requiring  strict  satisfaction  of  all 
constraints  at  the  initial  estimate  of  the  solution  and  subsequent  iterates.  This  can 
be  advantageous  if  the  objective  function  is  not  defined  outside  the  feasible  region, 
or  if  only  a  solution  of  limited  accuracy  is  required. 

A  barrier-function  method  creates  a  sequence  of  modified  functions  having  mini- 
nizers  that  are  strictly  feasible  to  the  constraints  of  NIP.  This  property  is  achieved  by 
introducing  a  weighted  barrier  term,  which  is  a  continuous  function  with  a  positive 
singularity  at  the  constraint  boundaries.  As  the  weight  assigned  to  the  barrier  term 
is  decreased  towards  zero,  the  minimizers  of  the  barrier  function  tend  to  minimizers 
of  the  original  problem.  Since  barrier  methods  generate  strictly  feasible  iterates,  they 
belong  to  the  class  of  so-called  interior-point  methods. 

Many  barrier  functions  have  been  considered  in  the  context  of  nonlinear  optimiza¬ 
tion  [FM68j.  In  this  dissertation  we  consider  only  the  logarithmic  barrier  function 
originally  suggested  by  Frisch,  1955  [FriSS].1  Application  of  the  logarithmic  barrier 

HVe  note  that  Frisch  did  not  minimize  the  barrier  function  directly,  but  instead  used  its  gradient 
in  combination  with  the  objective  function  to  enforce  feasibility. 
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Figure  3.1:  Contours  of  B(x,p)  for  p  =  0.1 


transformation  to  NIP  yields  the  following  subproblem: 

m 

BNP  minirnize  B{x,p)  =  F(x)  —  ^^ln(c,(x))  (3.2.1) 

l€*n  i=i 

where  the  positive  scalar  p  is  known  as  the  barrier  parameter. 

Some  fundamental  issues  have  to  be  addressed  when  using  a  barrier  method.  For 
example,  how  well  does  the  unconstrained  problem  approximate  the  constrained  one? 

Typical  graphs  of  the  contours  of  B{x,p)  for  two  values  of  p  are  illustrated  in 
Figures  3.1  and  3.2.  They  exhibit  some  of  the  features  of  barrier  functions.  As 
expected,  the  effects  of  the  logarithmic  terms  extend  further  into  the  interior  when 
p  is  large.  When  p  is  small  their  effect  is  negligible  away  from  a  small  neighborhood 
of  the  constraint  boundaries.  By  inspection  it  is  clear  that  for  any  positive  value  of 
p,  the  minimum  of  B(x,p)  can  never  lie  on  a  constraint  of  the  original  problem.  It 
can  also  be  observed  that  the  iterates  x*(p)  approach  x*  as  p  — *  0.  This  behavior  is 
expressed  formally  in  Theorem  3.3.1. 

Notice  also  that  as  p  decreases  the  contours  of  the  corresponding  barrier  function 
become  more  elongated,  almost  parallel.  Hence,  we  can  foresee  the  difficulty  a  method 
might  encounter  in  computing  accurately  the  component  of  the  search  direction  that 
is  tangential  to  the  constraints  active  at  x*. 
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Figure  3.2:  Contours  of  B{x,  p)  for  p  =  0.01 


3.3  Convergence  Results 

We  shall  consider  the  behavior  of  the  barrier  function  along  a  given  trajectory.  The 
first  theorem  shows  that  all  feasible  minimizers  of  B{x,  p)  are  related  to  those  of  NIP 
as  p  — »  0. 

THEOREM  3.3.1.  Let  x*  be  the  set  of  minimizers  of  NIP  (3.1.1),  and  x*(/*)  the 
set  of  feasible  minimizers  for  the  barrier  subproblem  BNP  (3.2.1).  Let  x*(p)  be  any 
point  in  y*(/i)  and  let  x*M  be  the  closest  point  to  x *(p)  in  x* •  If  conditions  (i)-(iii) 
of  Theorem  3.1.2  hold  then 

\\m\\x*(p)-x*M\\  =  0. 

Proof.  See  Fiacco  and  McCormick  [FM68]. 

Convergence  Analysis. 

For  completeness  we  present  a  result  equivalent  to  the  one  stated  in  Theorem 
3.3.1,  but  for  the  special  case  of  strictly  convex  quadratic  programs. 

THEOREM  3.3.2.  Consider  the  following  quadratic  program: 

minimize  fjx)  =  cTx  +  ~xTHx 

xgRn  1 

subject  to  Ax  =  6,  x  >  0, 


(3.3.1) 


3.3.  Convergence  Results 
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ivhere  A  is  an  m  x  n  matrix  of  full  row  rank.  We  shall  assume  that 

(i)  strict  complementarity  holds  at  the  solution; 

(ii)  a  strictly  feasible  point  exists; 

(iii)  the  feasible  region  is  compact; 

(iv)  the  matrix  Ii  is  positive  definite. 

Then, 

lim  x*(p)  =  x*, 

u—o 

where  x*  is  the  solution  of  the  QP  and  x*(p)  is  the  minimizer  of 


minimize 
subject  to 


B(x,  p)  =  cTx  +  | xTHx  —  /i£  In  Xj 

j= i 


Ax  =  b. 


Proof.  Let  D  =  diag(x_,)  be  defined  for  any  x  >  0,  and  let  Z  be  a  basis  for 
the  null  space  of  A.  If  //  is  positive  definite,  the  reduced  Hessian  for  the  barrier 
subproblem,  namely  ZT{H  +  pD~2)Z ,  must  also  be  positive  definite  for  any  p  >  0 
and  x  >  0.  Hence  the  minimizer  x*(p)  and  the  associated  Lagrange  multipliers  tt *(p) 
exist  and  are  unique.  From  the  first-order  necessary  conditions  (3.1.2)  for  the  barrier 
subproblem  (3.3.1),  we  have  that 

V,£  =  -  AVw  =  0, 


i.e., 

c+  Hx*{p)  -  pD-'e  -  Arir*(p)  =  0  (3.3.2) 

where  L  =  D(x,p)  —  r7(/lx  —  b).  Premultiplying  this  equation  by  x*(p)  gives 

cTx*(p)  +  x*{p)rHx*{p)  -  peTD~lx*(p)  —  x* (p)rArri* (p)  =  0. 

Since  Ax*{p)  =  6, 

cTx*(p )  +  x*(p)TIIx*(p)  —  bTir*  ( p )  =  np.  (3.3.3) 

Defining  x*(0)  =  lim;x_0x*(/*),  we  see  that 

cV( 0)  +  x*(O)r//.r*(0)  -  6V(0)  =  0. 


(3.3,1) 
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The  dual  of  the  quadratic  program  considered  is 

maximize  /d(u/,  ?r)  =  bTir  —  \wTHw 

2  (3.3.5) 

subject  to  ATir  —  Hw  <  c. 

Since  pD^e  >  0,  it  follows  from  (3.3.2)  that  (x*(p)1i:*(p))  is  dual  feasible.  In 
particular,  (x*(0),7r*(0))  is  dual  feasible.  Also,  the  duality  gap  is 

fp(x )  —  fd{u>,  ir)  =  cTx  +  t-xtHx  -  bTir  +  | wTHw , 

so  that 

fP{x*( 0))  -  /(f(x*(0),7r+(0))  =  cV'(O)  +  x*(0)tHx*(0)  -  6V(0)  =  0, 

using  (3.3.4).  It  follows  from  (2.5.9)  that  x*  =  i*(0). 

I 

Existence  of  an  Isolated  Trajectory. 

Under  suitable  assumptions,  the  set  of  feasible  minimizers  of  the  barrier  function 
can  be  regarded  as  a  function  of  an  independent  variable  p,  tracing  out  a  smooth 
tmjectory  of  points  x*(p)  converging  to  x*. 

The  assumptions  needed  to  define  a  unique  trajectory  of  local  minima  are  stronger 
than  those  needed  to  prove  the  existence  of  points  converging  to  a  local  minimizer  of 
NIP.  We  are  interested  in  an  isolated  trajectory ,  i.e.,  a  trajectory  that  is  locally  unique. 
An  isolated  trajectory  is  a  continuous  function  x*(/i),  where  every  point  x*(ji)  on  the 
trajectory  is  a  feasible  isolated  local  minimum  of  B(x,fi).  The  following  result  is 
proved  in  Fiacco  and  McCormick  [FM68]. 

THEOREM  3.3.3.  If  A(x*)  has  full  row  rank  and  the  sufficient  conditions  of  The¬ 
orem  3.1.2  hold  at  x* ,  then  for  sufficiently  small  p  there  exists  a  continuously  differ¬ 
entiable  trajectory  x* (p)  such  that 

lim  x*(p )  =  x* , 
o 

where  x*(p)  is  a  local  minimizer  of  B(x,p). 

Figure  3.3  depicts  the  trajectory  of  minimizers  of  the  barrier  function  applied  to 
the  convex  QP  problem  given  in  Chapter  1. 

The  trajectory  x*(p)  has  several  interesting  properties.  Expanding  about  p  =  0 
gives  the  following  expression: 

x{p)  =  x*  +  py  +  0{p2), 


where 
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Figure  3.3:  Trajectory  of  x*(/z),  with  p0  >  fti  >  /x2 


x(n)  -  x*  dx(fi) 
y  =  lim  ~ 

ii—o  ft  dft 


(3.3.6) 


y  •••••  —  f  1  »  ^  VVf. 

ft  dft  |M=o 

By  definition  of  an  unconstrained  minimizer,  the  following  relation  holds  at  x*{fi): 


mi  ‘ 

VB  =  g-n^~ai  -g-AT  •  =0. 

«=!  *  , 

WW 

Differentiating  (3.3.7)  and  using  (3.3.6),  we  obtain  the  following  expression: 


(3.3.7) 


Ay  = 


(3.3.8) 


The  relationship  (3.3.8)  implies  that  the  minimizers  of  successive  barrier  functions  do 
not  approach  x*  tangentially  to  any  constraint  for  which  0  <  A*  <  oo. 
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Existence  of  Derivatives  of  x(p)  for  p  >  0. 

In  general,  the  trajectory  has  an  order  of  differentiability  with  respect  to  p  (when 
p  >  0)  that  is  one  less  than  that  of  the  original  problem  functions  and  is  analytic 
when  the  functions  are  analytic.  It  has  also  been  shown  [FM68]  that  under  certain 
assumptions  the  limits  of  the  derivatives  at  p  =  0  exist  and  are  finite.  This  has  some 
computational  implications. 

3.4  Properties  of  Barrier  Functions 


Let  {/a}  be  a  strictly  decreasing  positive  sequence,  with  \\mk~>oo  l<-k  =  0.  The  min- 
imizers  of  successive  barrier  functions  along  a  given  trajectory  exhibit  the  following 
properties: 

(i)  {Bh}  is  strictly  decreasing  for  sufficiently  small  pk  and  bounded  c*; 

(ii)  {/^}  is  nonincreasing; 

(iii)  —  ln{ci(x*(/t*))}  is  nondecreasing, 

where  Bk  denotes  B(x*(pk),pk),  Fk  denotes  F(x*(pk))  and  ck  denotes  c(x*(pk)). 
Property  (iii)  does  not  imply  that  all  constraint  values  decrease  at  successive  x*(pk)- 
A  reduction  in  the  barrier  parameter  allows  the  constraints  to  approach  the  boundary 
of  the  feasible  region,  but  does  not  enforce  a  decrease  in  the  components  of  ck. 

Since  p  >  0  and  c,  >  0  for  all  t,  identity  (3.3.7)  shows  that  the  gradient  of  F 
at  x* (p)  is  a  nonnegative  linear  combination  of  all  the  constraint  gradients,  where 
the  coefficient  of  a,  is  p/ci.  As  p  approaches  zero,  the  quantity  p/ct(x*(p))  will 
converge  to  zero  if  c,  is  not  active  at  x*,  since  q  is  strictly  bounded  away  from  zero 
in  a  neighborhood  of  x*.  Assume  that  m  constraints  are  active  at  x  .  Then  for 
sufficiently  small  /*,  the  relation  holding  at  x*(p)  can  be  written  as 


9  =  A1 


'  P/ci  ^ 

W4/ 


+  0(p), 


(3.4.1) 


where  ci  denotes  the  i-th  active  constraint,  and  A  denotes  the  rnxn  matrix  of  active 
constraint  gradients. 

It  follows  from  (3.4.1)  that  the  quantity 

s  WuT) y 


(3.4.2) 
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defined  only  for  the  active  constraints,  satisfies  a  relationship  with  g  and  A  analogous 
to  the  multiplier  relation  that  must  hold  at  x*.  The  vector  X(g)  satisfies 

Afo)  «  A*  +  0(g), 

where  X*  is  the  vector  of  Lagrange  multipliers  at  x*. 

3.5  Barrier  Methods 

At  first  sight,  it  might  seem  that  the  solution  to  NIP  could  be  found  simply  by  setting 
g  to  a  very  small  value  and  using  a  standard  unconstrained  method.  Unfortunately, 
there  are  difficulties  associated  with  computing  an  unconstrained  minimum  of  (3.2.1) 
for  a  small  value  of  g.  Also,  for  a  given  problem  it  may  not  be  known  a  priori  what  the 
relatively  small  value  of  g  should  be.  Hence  a  sequence  of  subproblems  with  decreasing 
values  of  the  barrier  parameter  must  be  solved.  The  choice  of  values  for  g  constitutes 
an  interesting  subject  of  study.  The  efficiency  of  the  approach  critically  depends  upon 
a  sensible  strategy  for  choosing  g.  We  are  also  required  to  solve  an  unconstrained 
problem  whose  objective  function  contains  a  barrier  term.  Such  problems  are  difficult 
to  solve  for  several  well  known  reasons. 

Most  practical  barrier  methods  proceed  as  a  classical  continuation  algorithm, 
where  the  solution  from  the  previous  minimization  problem  is  used  as  an  initial  es¬ 
timate  of  the  solution  for  the  next  problem.  If  am  accurate  solution  of  a  subproblem 
is  found  it  has  been  shown  (e.g.,  see  [FM68])  that  advantage  may  be  taken  of  the 
asymptotic  behavior  of  h  =  x*(gk)  —  x*  to  estimate  x*(gk+i),  the  solution  for  the 
next  barrier  subproblem. 

3.6  A  Model  Algorithm 

In  this  section  we  shall  outline  a  model  algorithm  for  solving 

minimize  F(x) 
zexn  ' 

subject  to  Ax  =  b  \  3.6.1) 

c(x)  >  0, 

using  a  barrier  function.  Assume  the  following  quantities  are  known:  a  strictly  feasible 
initial  point  x0  ( Ax0  =  6,  c(x0)  >  0),  a  sequence  {gt}  such  that  lim =  0  and 
/'Jt  >  gk+i  >  0,  and  a  sequence  {£*},  where  M  >  6*  >  0. 


38 


Chapter  3.  Barrier  Methods 


Algorithm  BARALG  (Model  Barrier  Algorithm) 

BA1.  [Initialize]  Set  x0  =  xo,  k  =  0. 

BA2.  [Solve  the  barrier  subproblem]  Using  Xk  as  the  first  point,  generate  a  set  of 
strictly  feasible  points  {£,0**)},  s  =  0, 1, . . . ,  sk,  that  estimate  a  solution  x*(pk) 
of  the  subproblem 

m 

minimize  B(xt  pk)  =  F(x)  -  pkJ2  In  cf(x)  g 

subject  to  Ax  =  6, 

where  x,k(fjLk)  is  the  only  point  of  the  set  that  satisfies  the  conditions 

||2rV£(xn(^),/'*)ll  <  Skpk,  (3.6.3) 

A/t  >  —6kpk,  (3.6.4) 

where  A*  is  the  smallest  eigenvalue  of  ZrV2B(xlk(pk),  pk)Z,  and  Z  is  a  matrix 
whose  columns  span  the  null  space  of  A. 

BAS.  [Update  the  estimate  of  the  solution]  Set  xk+l  =  x,k(pk). 

BA4.  {Update  the  iteration  count]  Set  k  =  k  -f  1  and  return  to  step  BA2. 

Since  i;m*-.ooP*  =  0,  it  is  self-evident  that  lim/t_oo  ||a:fc  -  a$||  =  0,  where  x*k  is 
the  nearest  point  to  xk  in  x*  (the  set  of  minimizers  of  3.6.1).  The  difficulty  lies  in 
being  able  to  generate  the  set  of  points  £j(pk)  so  that  a  point  satisfying  the  relevant 
conditions  will  eventually  be  found.  This  issue  is  addressed  in  the  next  chapter. 


Chapter  4 

Solution  of  Barrier  Subproblems 

4.1  Introduction 

In  the  previous  chapter  we  described  a  barrier  algorithm.  The  key  step  was  to  de¬ 
termine  a  point  that  satisfies  conditions  (3.6.3)  and  (3.6.4)  for  each  value  of  //.  A 
minimizer  of  the  barrier  subproblem  is  one  such  point.  Moreover,  all  points  in  a  finite 
neighborhood  of  a  minimizer  satisfy  these  conditions.  Therefore  a  means  of  finding 
the  required  point  is  to  use  an  algorithm  that  finds  a  minimizer  of  the  barrier  subprob¬ 
lem.  Since  all  points  in  a  finite  neighborhood  satisfy  the  conditions,  any  algorithm 
that  generates  a  subsequence  converging  to  a  minimizer  will  find  a  suitable  point  in 
a  finite  number  of  iterations. 

A  difficulty  with  general  nonlinear  minimization  is  that  no  known  practical  al¬ 
gorithms  are  guaranteed  to  find  minimizers.  The  best  that  can  be  hoped  for  is  an 
algorithm  to  find  a  point  that  satisfies  the  necessary  conditions  for  optimality.  We 
can  assume  that  any  such  point  is  a  minimizer,  or  accept  the  fact  that  the  barrier 
algorithm  may  converge  to  a  point  that  satisfies  only  the  necessary  conditions.  This 
is  no  less  satisfactory  than  for  current  active-set  methods.  Note  that  for  the  orig¬ 
inal  QP,  any  point  satisfying  the  necessary  conditions  is  a  minimizer  provided  all 
active  constraints  have  nonzero  multipliers,  and  the  only  difficulty  is  distinguishing 
dead-points  from  minimizers  (see  Section  2.3). 

In  this  chapter  we  discuss  algorithms  for  finding  minimizers  of  the  barrier  subprob¬ 
lem  (3.6.2)  arising  in  step  BA2  of  the  model  barrier  algorithm  BARALG.  It  wili  be 
seen  that  for  the  large-scale  case,  such  algorithms  are  both  rare  and  complex.  We  also 
propose  an  alternative  to  a  general-purpose  algorithm  that  takes  specific  advantage 
of  the  form  of  the  barrier  subproblem. 

Whatever  approach  is  used,  the  subproblem  has  some  special  difficulties  not 
present  in  the  usual  equality-constrained  optimization  problem.  For  example,  with  a 
normal  problem  it  is  trivial  to  provide  an  initial  point  that  satisfies  the  linear  equal¬ 
ity  constraints,  but  here  the  point  must  also  be  strictly  interior  with  respect  to  the 
bounds.  If  a  trust-region  approach  is  used  [DS83,FleS7],  it  would  imply  that  the 
usual  trust-region  subproblem  has  hidden  constraints,  which  may  therefore  not  be 
satisfied  at  the  solution.  Consequently,  it  may  be  necessary  re-solve  the  trust-region 
subproblem.  We  have  restricted  our  attention  to  linesearch  methods. 
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4.2  Optimality  Conditions 

Barrier  subproblems  in  algorithm  BARALG  have  the  following  form: 

LEP  minimize  Fix ) 

ieR"  (4.2.1) 

subject  to  Ax  —  b, 

where  F(x)  is  a  twice  continuously  differentiable  function  and  A  is  an  m  x  n  matrix 
of  full  row  rank.  Let  g(x)  and  HF(x)  denote  VF(i)  and  V2F(x)  respectively. 

We  begin  by  stating  the  optimality  conditions  for  LEP.  The  necessary  conditions 
for  x*  to  be  a  solution  of  LEP  are  given  in  the  following  theorem. 

THEOREM  4.2.1.  (Necessary  conditions  for  a  solution  of  LEP).  If  A  has  full  row 
rank  and  Z  is  a  basis  for  the  null  space  of  A,  necessary  conditions  for  a  feasible  point 
x*  to  be  a  minimizer  of  LEP  are 

(i)  there  exists  a  vector  x*  such  that  g(x*)  =  ATx* , 
or  equivalently,  ZTg{x*)  =  0;  and 

(ii)  ZtHf(x*)Z  is  positive  semidefinite. 


The  vector  ZTg(x)  is  termed  the  reduced  gradient  of  F  at  x.  Any  point  x  such 
that  ZTg(x)  —  0  is  termed  a  constrained  stationary  point  of  F  (with  respect  to  the 
constraints  Ax  =  6).  The  matrix  ZTHF(x)Z  is  called  the  reduced  Hessian. 

Sufficient  conditions  for  optimality  are  analogous  to  those  for  NIP  (see  Theo¬ 
rem  3.1.2),  except  that  no  sign  restriction  on  x*  is  imposed  and  the  Hessian  of  the 
Lagrangian  becomes  the  Hessian  of  F(x)  itself. 

THEOREM  4.2.2.  (Sufficient  conditions  for  an  isolated  solution  of  LEP).  If  A  has 
full  row  rank  and  Z  is  a  basis  for  the  null  space  of  A,  a  feasible  point  x*  is  an  isolated 
solution  of  LEP  if 

(i)  there  exists  a  vector  x*  suck  that  g(x*)  =  ATx*, 
or  equivalently,  ZTg(x*)  =  0;  and 

(ii)  ZTHF(x*)Z  is  positive  definite. 

If  x*  satisfies  the  necessary  conditions  of  Theorem  4.2.1,  the  problem  of  determin¬ 
ing  whether  or  not  it  is  an  isolated  minimizer  is  an  NP-hard  problem. 


4-3.  Methods  for  Solving  LEP 
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4.3  Methods  for  Solving  LEP 

Almost  all  current  methods  for  solving  LEP  are  feasible  descent  methods.  Given 
an  initial  feasible  point,  a  sequence  {x*}  is  generated  that  retains  feasibility  and 
for  which  {F(xk)}  is  strictly  monotonically  decreasing.  Normally  all  function  values 
F(xk)  must  be  computed.  For  a  logarithmic  barrier  function  it  means  evaluating 
many  logs.  While  this  is  not  prohibitively  expensive  it  is  not  wholly  desirable.  We 
shall  discuss  means  of  avoiding  such  computations. 

We  shall  be  interested  in  linesearch  methods,  in  which  a  step  s  is  chosen  to  achieve 
a  ‘‘sufficient”  decrease  in  some  merit  function.  Initially  we  assume  that  the  current 
point  x  satisfies  Ax  =  6,  and  we  use  F(x)  itself  as  a  merit  function.  The  next  iterate 
x  -r  s  must  satisfy  A(x  +  s)  —  b,  and  F(x  +  s)  must  be  sufficiently  less  than  F(x). 
The  exact  meaning  of  “sufficiently  less”  will  be  made  clear  later. 

Ordinarily,  obtaining  a  feasible  point  to  a  set  of  linear  equality  constraints  is  a 
trivial  problem.  However,  in  our  case  we  require  an  initial  point  that  satisfies  Ax  —  b 
and  x  >  0.  We  shall  therefore  consider  an  approach  that  docs  not  require  the  initial 
point  to  be  feasible  with  respect  to  the  linear  equality  constraints. 

We  shall  restrict  our  interest  to  Newton’s  method  and  its  many  variants. 

4.4  TVansformation  to  an  Unconstrained  Problem 

We  may  solve  LEP  by  first  transforming  it  to  an  unconstrained  problem.  Suppose  a 
point  x0  is  known  such  that  Aio  =  b.  A  solution  of  LEP  is  given  by  x0  +  Zy* ,  where 
y*  is  a  minimizer  of  the  function  !F{y)  S  F{x o  +  Zy ).  If  Newton  s  method  is  applied 
to  the  problem 

minify), 

V’ 

the  search  direction  in  the  y-spacc,  say  pz,  is  given  as  the  solution  of  the  following 
equations: 

ZTIIFZpz  =  —ZTg.  (4.4.!) 

The  equivalent  search  direction  in  the  x*space  is  given  by  p  —  Zpz.  Cleariy  this 
approach  is  practical  only  if  it  is  computationally  convenient  to  form  the  products 
ZtHfZ  and  ZTg  for  some  appropriate  matrix  Z. 

Many  methods  have  been  proposed  to  solve  unconstrained  minimization  prob¬ 
lems  (see,  e.g.,  Fiacco  and  McCormick  [FM6S],  Gill  and  Murray  [GM74].  McCormick 
[McG77j.  Fletcher  and  Freeman  [FF77],  Mukai  and  Polak  [MP7S],  Kaniei  and  Pax 
|KD79],  More  and  Sorensen  [MS79|,  Goldfarb  (GoISO]  and  Forsgren.  Giii  and  Murray 
iFGMSOa])  and  some  can  be  extended  to  the  large-scale  case.  The  methods  arc  based 
mainly  on  computing  a  direction  of  sufficient  descent  and  a  direction  of  sufficient 
negative  curvature  whenever  such  directions  exist.  The  methods  vary  on  how  to  com¬ 
pute  such  directions.  Typically  a  descent  direction  is  determined  by  first  identifying  a 
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matrix  E  such  that  ZTHFZ  +  E  is  positive  definite.  The  modified  matrix  is  then  used 
in  place  of  ZTHFZ  in  (4.4.1).  Computing  a  direction  of  negative  curvature  is  often 
more  difficult  but  a  number  of  methods  are  known  [McC77,MS79,Gol80,FGM8S 

In  some  applications  such  as  structural  analysis,  A  is  such  that  a  sparse  null-sp^e 
matrix  Z  can  be  found.  If  HF  is  diagonal  and  positive  definite  (as  in  the  barrier 
subproblems  for  the  LP  case),  it  is  conceivable  that  Z^HpZ  might  also  be  sparse, 
and  that  current  sparse  Cholesky  factorization  [GL81,CGLN84]  methods  could  be 
applied. 

For  a  general  HF,  however,  ZTHFZ  is  not  likely  to  be  sparse,  and  a  direct  fac¬ 
torization  will  not  be  practical  unless  n  —  m  (the  column  dimension  of  Z)  is  quite 
small  (say  <  50).  The  difficulty  is  not  the  need  to  store  a  dense  matrix  but  rather 
the  considerable  effort  require  to  form  the  matrix  product.  Some  economy  is  possible 
in  the  case  of  the  barrier  function  when  all  the  inequalities  are  simple  bounds.  Ine 
Hessian  is  then  of  the  form  Hz  =  H  +  D,  where  D  is  a  diagonal  matrix.  Only  D 
is  a  function  of  x ,  and  we  may  form  ZTHFZ  by  forming  ZTH Z  only  once.  If  Z  is 
sufficiently  sparse  (as  in  the  structural  analysis  example),  each  subsequent  matrix 
ZTDZ  may  be  sparse  and  cheap  to  form. 

If  HP  and  Z  have  the  property  that  products  of  the  form  HFu,  Zv  and  Zrw  can 
be  computed  efficiently,  a  conjugate-gradient  method  [HS52,GL89]  could  be  applied. 
If  ZTHFZ  is  positive-definite  for  all  relevant  values  of  HF(x ),  the  theory  of  inexact 
Newton  methods  [DES82]  shows  that  pz  need  not  be  computed  accurately  (but  the 
accuracy  should  increase  as  x  —*  x*). 

In  general,  conjugate-gradient  methods  are  likely  to  require  too  many  iterations 
unless  the  system  involved  has  a  low  condition  number  or  a  favorable  clustering  of 
eigenvalues.  Sometimes  a  preconditioner  M  =  CCT  might  be  known  that  induces 
such  properties  in  the  transformed  system  C~1(ZTHFZ)C~T .  (See  [GL89]  for  a  dis¬ 
cussion  of  preconditioners.)  However,  there  is  no  general  procedure  for  finding  a  good 
preconditioner  in  this  context. 

4.5  The  KKT  System 

It  has  been  shown  that  while  a  null-space  approach  may  be  possible  for  some  problems 
it  is  not  viable  for  general  problems.  Rather  than  using  (4.4.1),  a  mathematically 
equivalen*  way  to  obtain  p  is  to  solve  the  FKT  system 


4.6.  The  LBLt  Factorization 
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Recently  Forsgren  and  Murray  [FM90]  have  proposed  a  method  based  on  a  direct 
factorization  of  K.  The  approach  we  have  taken  in  our  implementation  is  similar 
and  involves  the  same  matrix  K  (see  Section  4.7).  When  A  and  II F  are  sparse,  it  is 
reasonable  to  suppose  that  a  sparse  factorization  of  K  exists. 

When  ZTHFZ  is  not  available,  the  many  methods  to  obtain  both  a  descent  direc¬ 
tion  and  a  direction  of  negative  curvature  are  no  longer  applicable.  When  ZTHFZ  is 
not  positive  definite  suitable  descent  direction  can  be  computed  by  replacing  this 
matrix  by  any  posi'  definite  matrix.  If  the  inertia  of  K  is  known,  we  can  determine 
when  ZTHFZ  is  m  positive  definite  by  applying  the  result  of  Lemma  2.1.2.  Unfor¬ 
tunately  if  K  does  not  have  the  correct  inertia,  to  obtain  a  suitable  direction  it  is  not 
sufficient  to  replace  K  with  just  any  matrix  with  suitable  inertia.  More  significantly, 
it  was  not  clear  until  the  work  of  Forsgren  and  Murray  how  to  compute  a  direction 
of  negative  curvature  efficiently  knowing  only  a  factorization  of  K. 


4.6  The  LBLT  Factorization 

Since  K  is  symmetric  but  indefinite,  the  preferred  method  of  determining  a  solution 
to  (4.5.1)  is  via  the  factorization  PTI(P  =  LBLT,  where  P  is  a  permutation  matrix, 
L  is  unit  lower  triangular,  and  B  is  block  diagonal  with  blocks  of  dimension  1  or  2; 
see  Bunch  and  Parlett  [BP71],  Bunch  and  Kaufman  [BK77]. 

The  LBLr  factorization  in  the  large-scale  case  is  performed  in  two  stages:  an 
analyze  phase  and  a  numerical  phase.  The  analyze  phase  is  a  symbolic  factorization 
that  determines  a  pivoting  strategy  (a  row  and  column  ordering)  to  minimize  the 
fill-in  in  the  factor  L.  Since  the  sparsity  pattern  of  the  KKT  systems  is  constant,  the 
analyze  phase  is  needed  only  once.  A  strategy  commonly  used  is  the  minimum-degree 
ordering,  which  seeks  to  reduce  fill-in  by  choosing  a  diagonal  pivot  corresponding  to 
th"  r  .-v  that  currently  has  the  least  number  of  nonzero  elements.  In  the  numerical 
ph  *  v  .,he  pivot  order  may  be  changed  to  ensure  numerical  stability.  A  consequence 
of  t.  .o  second  level  of  ordering  is  that  ||L||  is  bounded.  Forsgren  and  Murray  [FM90] 
have  suggested  a  third  level  of  ordering,  in  which  the  objective  is  to  ensure  that 
the  inertia  of  the  partially  formed  factorization  satisfies  certain  rules.  The  pivoting 
strategy  they  propose  is  shown  to  be  both  necessary  and  sufficient  for  computing  a 
direction  of  sufficient  descent  and  a  direction  of  sufficient  negative  curvature  (when 
such  a  direction  exists)  using  the  LBLT  factorization. 

Having  obtained  a  direction  of  sufficient  descent  and  a  direction  of  sufficient  neg¬ 
ative  curvature,  Forsgren  and  Murray  suggest  that  the  two  directions  be  used  in  a 
curvilinear  linesearch,  or  combined  in  a  particular  fashion  for  a  regular  linesearch.  In 
our  case  we  have  a  strong  preference  for  a  regular  linesearch,  since  it  is  imperative 
that  the  steplength  be  chosen  to  ensure  that  the  new  iterate  is  feasible  with  respect  to 
the  constraints  of  the  original  problem.  The  existence  of  such  a  point  is  guaranteed, 
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since  F(x)  and  its  derivatives  are  infinite  at  the  boundary  of  the  original  feasible  re¬ 
gion.  No  matter  which  of  the  standard  steplength  criteria  are  used,  a  strictly  feasible 
point  satisfying  them  exists. 

Although  the  method  of  Forsgren  and  Murray  could  be  applied  to  the  subproblem, 
it  would  be  useful  if  we  could  avoid  introducing  the  third  level  of  ordering.  Moreover, 
it  would  be  beneficial  if  some  merit  function  other  than  the  barrier  function  could  be 
used.  The  method  of  Forsgren  and  Murray  also  requires  an  initial  feasible  point  (for 
the  original  constraints)  and  in  general  such  a  point  will  not  be  known.  In  the  next 
sections  we  describe  an  approach  that  attempts  to  circumvent  these  difficulties. 


4.7  Newton’s  Method  Applied  to  the  Optimality  Conditions 


The  first-order  optimality  conditions  for  LEP  can  be  expressed  by  the  nonlinear  equa¬ 
tions 


g(x)  -  Attt  =  0, 
Ax  —  b  =  0. 


(4-7.1) 


In  order  to  simplify  the  problem  of  finding  an  initial  point  satisfying  Ax  =  b,  we  have 
chosen  to  treat  these  equations  directly.  Newton’s  method  applied  to  these  nonlinear 
equations  leads  to  a  KKT  system  similar  to  (4.5.1). 

Let  the  optimality  equations  (4.7.1)  be  written  as 


/(z,tt) 


/ p(x)-  ATn\ 
y  Ax  —  b  J 


=  0. 


Newton’s  method  applied  to  (4.7.2)  corresponds  to  the  iteration 


J{xk,Trk) 


-f(xk,Tc  Jt), 


/  Zfc+i 


(4.7.2) 


(4.7.3) 


where  J(x k,ftk)  is  the  Jacobian  of  f(x,  n)  at  (xk,Kk)  and  0  <  a  <  1.  The  linear 
system  in  (4.7.3)  is  analogous  to  the  KKT  system  for  optimization  with  nonlinear 
objective  functions,  and  is  of  the  form 


/ HF(xk)  AT\  /-p\  _  f gL\ 

\  A  °/\9/  l7/ 


(4.7.4) 
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where 

9l  =  g(xk)  -  ATi rfc 

is  the  gradient  of  the  Lagrangian  for  LEP,  and 

r  =  Axk  —  b 

is  the  constraint  residual.  If  the  Jacobian  at  x*  is  nonsingular,  Newton’s  method 
converges  quadratically  to  a  stationary  point  when  (z0,  tto)  is  sufficiently  close  to  the 
solution. 

In  the  following  we  shall  use  the  notation  (p,q)  to  denote  the  column  vector 

(; f  <iT)T. 

LEMMA  4.7.1.  When  the  vector  (p,q)  is  obtained  via  (4-7.3)  and  a  sufficiently 
small  a  is  chosen  for  the  steplength,  every  nonzero  component  of  the  vector  ( gL,r ) 
decreases. 


Proof.  The  proof  of  this  fundamental  property  of  Newton’s  method  follows  from 
the  Taylor  series  expansion  of  /(i*,? r*). 

Define  the  following  notation:  /  =  f(xk,irk),  /  =  f(xk+i,  7r*+1 ),  r  =  Axk  —  b , 
f  =  Axk+i  ~b,gL~  VF{xk)  -  ATirk,  gL  =  VF(xk+1)  -  ATirk+ 1,  and  J  =  J(xk,  nk). 
Then 

f  =  f  +  aj(P)+t(a),  (4.7.5) 

where  the  t(a)  term  is  0(a2).  From  equations  (4.7.3)  and  (4.7.5)  we  have 

/=  (1  -o)/  +  <(o). 


For  (4.7.1)  this  implies 

9l  =  { 1  “  a)gL  +  t(a), 
f  =  (1  —  a)r. 

For  a  sufficiently  small  a,  <(a)  is  negligible  and  the  result  follows. 

I 


(4.7.6) 


The  lemma  implies  that  every  nonzero  component  of  r  decreases  for  any  value 
of  a  €  (0, 1].  Zero  components  of  r  will  remain  zero,  but  those  of  gL  may  become 
nonzero,  according  to  t(a).  We  note  that  when  Newton’s  method  is  applied  to  the 
specific  barrier  functions  considered  in  this  thesis,  an  explicit  expression  for  t(a)  can 
be  determined  (see  Section  4.9). 

Although  Lemma  4.7.1  seems  a  satisfying  result,  a  very  small  step  may  be  needed 
if  we  required  all  components  of  /  to  decrease.  A  more  useful  result  is  given  in  the 
following  lemma. 
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LEMMA  4.7.2.  When  the  vector  ( p,q )  is  obtained  via  (4-7.3)  and  a  sufficiently 
small  a  is  chosen  for  the  steplength,  |j(<?i,r)||2  deci'eases. 

Proof.  The  required  result  is  a  specific  case  of  a  standard  result  for  Newton’s  method 
applied  to  f(x)  =  0;  namely,  that  the  Newton  direction  generated  from  Jp  =  —  /  is  a 
descent  direction  for  the  function  M(x)  =  ^U/U^ 

(VM)'p  =  ( JT{fp  =  fjp  =  -/7  <  0. 


I 


For  the  rest  of  this  chapter,  ||.||  will  denote  the  Euclidean  norm.  It  follows  that  a 
steplength  a  can  be  chosen  to  ensure  a  decrease  in  r)||2.  In  other  words, 

Mi{x,  tt)  =  |t ||2  +  ||rj|2 

could  serve  as  a  merit  function  for  determining  a.  The  choice  of  a  will  depend  on  the 
linesearch  criteria.  Almost  any  criteria  will  do.  For  example,  for  the  Goldstein-Armijo 
conditions  [Arm66,Gol67]  it  can  be  shown  that  the  sequence  {xk}  will  converge  to 
a  stationary  point  provided  the  Jacobian  matrix  is  nonsingular  at  all  the  iterates. 
While  this  property  cannot  be  shown  to  be  true  in  general,  we  shall  show  later  that 
for  barrier  functions  with  a  judicious  choice  of  {pk}  and  {£*},  we  can  arrange  for  K 
to  be  almost  always  nonsingular. 

A  key  property  of  M2  is  that  (like  F)  both  the  function  and  its  derivatives  are  infi¬ 
nite  on  the  boundary  of  the  original  feasible  region.  Hence  we  may  choose  a  steplength 
satisfying  standard  criteria  that  ensures  that  the  new  iterate  remains  strictly  feasible 
with  respect  to  the  original  constraints. 

4.8  An  Alternative  Merit  Function 

For  our  specific  nonlinear  equations  (4.7.1),  we  shall  consider  an  alternative  merit 
function,1  namely 

M{x,Te)  =  ||<fcj|  +  ||r||.  (4.8.1) 

LEMMA  4.8.1.  Let  M(x,7r),p  and  q  be  defined  by  (4-8.1)  and  (4-7-4)  respectively. 
Then  VM(x,ir)  projected  along  ( p,q )  satisfies 

(pT  qT)VM(z,ir)  ~  — ||<7l||  —  ||r||. 

'Note  that  a  decrease  in  the  merit  function  does  not  necessarily  correspond  to  a  decrease  in  the 
objective  function  F(x).  However,  if  the  reduced  Hessian  of  F(x)  is  positive  definite  and  r  =  0,  the 
search  direction  is  also  a  direction  of  descent  for  the  objective  function. 


Proof.  Rewriting  M(x,n)  as 


M(x,ir)  =  (gJt/L)'n  +  (rTrj'/2, 


(4.8.2) 


we  have 


dM(x,ir)  u  T  ._1/2dgJgL  1  T  ,-i/tdrTr 

— ar~  =  5(sw‘>  aT +  5(rr)  ■&’ 

dM(x,ir)  _  ,  i  t  s-ind&L 
dir  ~  2(9l9l)  dir  ' 

We  can  simplify  the  terms  in  (4.8.3)  by  differentiating: 


(4.8.3) 


=  2(V<?t)V 

=  2(V(<?(x)  -  ATn))TgL 
—  2  Hp(x)gL 


(4.8.4) 


=  2i4rr. 


(4.8.5) 


From  (4.8.3),  (4.8.4)  and  (4.8.5),  we  now  get 


dM(x,ir)  _  1  dgJgL  1  drTr 

dx  2  II  gL  ||  dx  2||  r  ||  dx  ’ 

=  ^’ifsTI +  '4Vii' 

Repeating  a  similar  procedure  for  the  differentiation  with  respect  to  ir  yields 

dM{x,ir)  _  1  dgTLgL 

dx  2  ||  gL  ||  dir 

=  -Aj^. 


If  we  use  gL  and  r  to  denote  the  normalized  vectors  gL  and  r  respectively,  the  gradient 
of  M(x,ir)  can  be  written  as 


VM(x,  t)  = 


Hf  At  )  ( gL 


-A  0  \  r 
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Finally,  the  projected  gradient  of  M(x,ir)  can  be  determined: 


Hence, 

(PT  qT)VM(x,v)  =  ~|kll  “  IMI- 

■ 

Notice  that  once  x  is  feasible  (i.e.  r  =  0)  the  merit  function  becomes 

M(x,ir)  =  ||<7j, 


and  satisfies 

(pT  9T)V|M  =  -M- 


(4.8.6) 


(4-8.7) 


4.9  Newton’s  Method  Applied  to  the  Barrier  Subproblem 

In  this  section  we  apply  Newton’s  method  to  the  logarithmic  barrier -function  sub¬ 
problems  arising  from  a  QP  in  the  following  standard  form: 

minimize  \xTHx  4-  cTx 

2  (4.9.1) 

subject  to  Ax  =  b,  £  <  x  <  «, 

where  £  and  u  denote  upper  and  lower  bounds.  The  barrier  reformulation  yields  the 
subproblems 


n  n 

minirnize  B{x,  p)  =  cTx  +  ^ xTHx  —  p  ^  ln(xj  —  £j)  ~  ~  xi) 

i=i  j= i 


subject  to 


Ax  =  b. 


(4.9.2) 
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Since  the  equality  constraints  cannot  be  treated  by  a  barrier  transformation,  they  are 
handled  directly.  Denote 


Vj  —  xj  >  zj  —  uj  xj  > 

D,  =  diagfl/Kj),  Du  =  diag(l/z;),  (4.9.3) 

D,  =  D,-  D»,  PB  =  D]  +  Dl 

To  deduce  the  KI<T  system  corresponding  to  (4.9.2)  we  differentiate  B(x,p )  with 
respect  to  xj : 

(a)  gB( x)  =  Bx(x ,  ft)  =  c  +  Hx  -  pDge , 

(b)  HB(x)  =  BXI{x,ft)  =  H  +  pDB, 

Thus,  the  h.KT  matrix  K  has  the  form 

v_(nB  ^  _  (H  +  fiDB  /1T\ 

W  0  /  l  A  0  )■ 

Notice  that  changes  in  x  and  ft  only  affect  the  diagonal  of  the  upper  block  of  K. 
This  implies  that  only  the  diagonal  changes  between  Newton  iterations  and  between 
changes  to  the  parameter  ft.  This  fact  has  significant  consequences  for  both  the  sensi¬ 
tivity  of  the  linear  system  (to  be  discussed  in  Chapter  5)  and  the  efficient  computation 
of  subpiobiem  solutions  (see  Chapter  8). 

We  recall  from  Lemma  4.7.1  that  the  gradient  of  the  Lagrangian  is  given  by 

9 1  =  (1  ~oc)gL  +  t{a), 

where  gL  is  the  gradient  at  Xk+\  and  t(a)  =  0(a2).  In  the  case  of  the  logarithmic 
barrier  function,  we  can  derive  an  explicit  expression  for  <(o).  Writing  gB  =  gB(: Xk+i) 
as  a  function  of  gB  =  gB(xk)  yields 


(4.9.4) 


(4.9.5) 


gB  =  c+  Hx  -  pDge 
—  9- f  oi  Hp  -  fiDge 
=  (g  -  pDge)  -f  pDge  +  aHp  -  pDge 
=  gB  +  aHp  +  fi{Dg  -  Dg)e. 

From  (4.7.4)  we  obtain 

ATq  =  9l  +  Hgp , 

and  from  (4.9.7)  and  the  definition  of  gL, 

9l  =  9b~  Attt 

=  9b~  ATn  -  aATq 
=  9b  -  ATir  -  a(<7t  +  HBp). 


(4.9.6) 


(4.9.7) 


(4.9.8) 
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Using  expression  (4.9.6)  for  gB  and  (4.9.8)  we  have 


9l  =  (<7b  -  ATit)  -  agL  +  a(Hp  -  Hgp)  +  p(Dg  -  Dg)e 
=  (1  -  <*)9l  ~  pcxDBp  +  p(Dg  -  Dg)e. 

The  nonlinear  term  in  (4.9.9)  is  therefore 

t{a)  =  p(Dg  -  Dg)e  -  potDBp. 

Each  component  of  t(a)  can  therefore  be  expressed  as 


t{a)j  =  p 


1 


1 

- + 


1 


1  1 


Vj  ( Vj  +  apj )  Zj  '  {z,  -  apjjj  M°Pj  [y]  +  z) , 


(4.9.9) 


=  9aPj 


(  1  1  \  _  /J_  _1_ 

\y iivi  +  QPj) +  ziizi  -  <*Pj))  11011)3  \y] +  z] 


=  P<*Pj  | 

2  2 
=  papj 


_ <fPi _ QPj  \ 

zj(zi  ~  »Pj)  y]{yj  +  QP j ) / 

izj(zi  ~  QPj)  y](Vj  +  QPj))  ' 


(4.9.10) 


For  components  corresponding  to  free  variables,  the  nonlinear  term  is  not  present  and 
hence  gLj  decreases  monotonically  for  any  value  a  in  (0,1).  For  variables  with  lower 
(upper)  bound  only,  we  see  that  t(a)j  is  large  when  either  the  quantity  ijj  ( Zj )  or  the 
quantity  ijj  +  apj  ( zg  —  apf)  is  small.  That  is,  when  we  are  near  a  bound  and  //  is  not 
small,  the  a  necessary  to  guarantee  a  decrease  may  be  quite  small.  We  shall  return 
to  this  issue  in  Chapter  8. 


4.10  Backtracking  Linesearch 

Given  the  search  direction  ( p,q ),  we  choose  a  steplength  a  to  reduce  the  univariate 
function 

=  M(x  +  ap,  7T  +  0;<]r), 

using  a  simple  backtracking  linesearch  (Dennis  and  Schnabel  [DS83]).  Some  initial 
steplength  a  =  a0  is  accepted  if  the  following  test  is  satisfied: 


<Kq)  <  *(0)  +  7 Q^(0), 


4-1 1.  The  Inertia  of  K 


51 


where  7  is  a  constant  controlling  the  accuracy  of  the  linesearch.  A  typical  value  is 
7  =  0.5.  Otherwise  the  test  is  repeated  for  the  sequence  {(^)'oro,  i  =  1,2,...}  until 
the  test  is  satisfied.  We  have  shown  in  Section  4.8  that  the  chosen  function  M(x,  7r) 
satisfies 

tf'(0)  =  -#0), 

so  that 

<f>(a)  <  (1  —  7<*)<£(0). 

This  result  is  interesting  because  it  provides  a  bound  on  the  reduction  in  ||/(a;,  7r)|| 
at  the  new  point. 

The  simple  backtracking  strategy  is  typically  both  efficient  and  convenient  for 
barrier  functions.  We  may  choose  a0  =  min(l,  /?QmaX),  where  amM  is  the  maximum 
feasible  step  along  p  and  0  <  (3  <  1,  and  we  are  then  fissured  of  determining  a 
steplength  that  guarantees  the  new  iterate  is  strictly  feasible.  For  iterates  not  close 
to  a  minimizer  it  is  often  the  case  that  (3amikX  <  1.  The  initial  step  then  usually 
satisfies  the  termination  test. 

4.11  The  Inertia  of  K 

It  has  been  essential  to  the  discussion  of  Newton’s  method  applied  to  the  optimality 
conditions  that  K  has  inertia  (n,  m,  0).  In  general  this  cannot  be  guaranteed.  In 
this  section  we  discuss  how  to  ensure  that  K  has  the  correct  inertia  in  most  cases, 
and  what  steps  may  be  taken  on  those  occasions  when  it  does  not.  It  should  be  noted 
that  difficulties  can  arise  only  when  ZTHZ  is  not  positive  definite. 

First  note  that  K  can  be  singular  if  and  only  if  ZTHBZ  is  singular.  Moreover,  if 
ZTHgZ  is  positive  definite  then  K  has  the  required  inertia.  We  can  certainly  ensure 
this  at  the  initial  point  simply  by  choosing  po  sufficiently  large,  since 

ZtHbZ  =  ZTHZ  +  pZTDZ 

where  D  is  positive  definite.  We  shall  endeavor  to  take  advantage  of  this  relationship 
to  ensure  that  K  has  the  required  inertia.  If  p  is  large  enough,  I\  has  the  required 
inertia  at  all  points  in  the  level  set  {  0  |  M(x,p)  <  M(x0,p)}.  It  follows  that 
convergence  to  a  point  satisfying  the  required  conditions  is  possible  for  p  =  p0.  It  is 
nontrivial  to  select  a  value  of  po  that  satisfies  our  criterion.  However,  there  is  a  simple 
remedy  in  practice.  If  the  initial  choice  for  po  proves  to  be  too  small  (because  the 
inertia  of  K  is  incorrect)  then  po  can  be  increased  until  I\  has  the  required  inertia. 

Another  useful  relationship  is  that  the  reduced  Hessian  is  positive  definite  in 
the  neighborhood  of  a  minimizer  at  which  the  reduced  Hessian  is  positive  definite. 
Suppose  we  are  within  such  a  neighborhood.  If  p  is  reduced  only  slightly,  the  reduced 
Hessian  of  the  new  barrier  function  will  also  be  positive  definite.  This  suggests  the 
following  strategy  for  ensuring  that  I\  has  the  correct  inertia: 
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•  Select  a  large  value  for  p0  and  increase  it  if  necessary  until  K  has  the  required 
inertia. 

•  Whenever  p  is  reduced,  if  I\  does  not  have  the  required  inertia  at  the  new  value 
of  p,  change  p  to  be  less  than  the  old  value  but  greater  than  its  current  value. 

It  also  suggests  that  the  criterion  for  reducing  p  should  be  quite  stringent.  However, 
if  a  large  po  is  chosen  and  the  neighborhood  for  which  ZTHBZ  is  positive  definite  is 
large,  it  may  be  possible  to  reduce  p  quite  significantly,  and  without  the  need  to  find 
a  close  approximation  to  a  minimizer.  (See  Figures  4. 1-4.3.) 

Two  questions  remain  to  be  answered.  Firstly,  we  need  to  show  that  a  significant 
reduction  in  p  is  always  possible.  Secondly,  we  need  to  know  how  to  proceed  if  K  does 
not  have  the  required  inertia.  Note  that  the  latter  is  only  of  concern  when  p  ^  p0  or 
it  is  not  the  first  iteration  after  p  has  been  changed. 

There  is  no  loss  of  generality  if  we  assume  r  =  0  at  any  iteration  for  which  K 
does  not  have  the  required  inertia,  and  if  K  remains  sufficiently  nonsingular  (so  that 
a  is  bounded  away  from  zero).  Since  r  *—  (1  —  a)r,  r  —*  0  and  once  it  reaches  zero  it 
remains  zero. 

By  having  a  sufficiently  stringent  termination  criterion  for  p  =  po,  it  is  always 
possible  to  ensure  that  r  =  0  for  p  =  po  and  hence  for  all  p  <  p0.  Note  that  even 
if  r  ^  0,  a  point  satisfying  r  =  0  would  eventually  be  found  provided  the  weaker 
condition  of  K  being  nonsingular  was  always  satisfied. 

Reusing  Factors  of  K. 

The  discovery  that  I\  does  not  have  the  correct  inertia  may  be  made  during 
the  process  of  computing  the  LBLT  factorization  of  I\.  We  could  then  employ  the 
pivoting  strategy  suggested  by  Forsgren  and  Murray  [FM90],  but  the  following  lemma 
suggests  that  a  suitable  descent  direction  or  a  direction  of  negative  curvature  may  be 
obtained  by  decreasing  p  to  p  (say)  and  using  the  current  factors. 

LEMMA  4.11.1.  Letp  be  defined  as  follows: 

(“M’)(-i) 

and  assume  p  jfc  0  and  K{p)  is  nonsingular.  If  p  <  p  then  at  least  one  of  the  following 
inequalities  holds: 


pTHB(p)p  <  -7  or  gB{p)Tp  <  -7, 


where  7  >  0. 


4-11.  The  Inertia  of  K 
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Proof.  We  have 

Hb(h)  =  HB{p)  +  (/i  -  ji)D, 

where  D  is  a  positive  definite  diagonal  matrix.  It  follows  that 

Hs{p)p  +  it*  ~  P)Dv  ~  ATq  =  -gL(p). 

Premultiplying  by  pT  gives 

pTHB(fi)p  =  ~pTgB(p)  ~  (p  ~  p)prDp, 

so  that 

PT9b(p)  +  pTHB(fi)p  =  -/?, 

where  (3  =  (p  —  h  )pTDp  >  0. 

Clearly,  either  pTgB{p)  <  or  pTHB(ft)p  <  —\(3-  Setting  7  =  gives  the 
required  result. 

I 


Note  that  if  pTIIB{fx)p  <  —7  and  pTgB(p)  >  0  then  we  must  have  pTH0(p)p  <  —7 
and  pTgB{p )  <  0,  where  p  =  —p.  Thus,  if  we  have  a  direction  of  negative  curvature 
we  can  always  construct  a  direction  that  is  a  non-ascent  direction  and  a  direction  of 
negative  curvature. 

The  significance  of  this  result  is  two-fold.  Firstly,  we  are  able  to  compute  a  suitable 
descent  direction  and/or  a  direction  of  negative  curvature.  Secondly,  we  can  do  so 
without  the  need  to  refactorize  the  KKT  matrix.  The  assumption  that  p  /  0  and 
A'(/0  is  nonsingular  is  not  unduly  restrictive.  If  either  condition  does  not  hold  at  x , 
there  exists  0  6  (0, 1)  such  that  K{0p)  is  nonsingular  and  g^Op)  0.  This  last  result 
implies  p{0g)  ^  0.  It  is  of  course  necessary  to  refactorize  the  KKT  matrix  if  A'(/i) 
proves  to  be  singular.  The  following  lemma  shows  that  the  set  of  values  0 ,  for  which 
K{0p)  is  singular,  is  finite. 

LEMMA  4.11.2.  If  I((x,p)  is  singular  there  exist  at  most  n  —  m  values  of  0  such 
that  I\(x ,  Op)  is  singular. 

Proof.  I\  is  singular  if  and  only  if  ZTH0Z  is  singular  (see  Forsgren  and  Murray 
[FM90]).  Now  ZTIIn(0p)Z  =  ZrIIB(p)Z  +  (0  —  1  )pZTDZ.  where  D  is  nonsingular 
and  positive  definite.  Hence 

ZTHD(0,p)Z  =  ZrIID{p)Z  +  (0-  1  )RrR 

=  Rt( R~TZTIIB(p)ZR~ 1  +  (0  -  1 )/)/?. 


where 


RtR  =  pZTDZ. 
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Thus,  ZTHB(0,p)Z  is  singular  if  and  only  if(0—  1)  is  equal  one  of  then— m  eigenvalues 
of  R-TZTHB{p)ZR~l. 

I 

Asymptotic  Reduction  of  p. 

We  use  the  notation  K/H  to  denote  the  Schur  complement  of  H  for  a  matrix  K 
partitioned  as  follows: 


K  = 


K/H  =  M  -  Air'A7 , 


(4-11.1) 


where  H  is  assumed  to  be  nonsingular. 

To  show  that  we  are  not  inhibited  in  the  asymptotic  rate  at  which  p  can  be  reduced 
we  first  need  the  following  lemma. 


Lemma  4.11.3.  If 


*  FR 


■( 


H FR 

•4fr  0 


lias  inertia  (nFR,  m,  0),  where  HVR  is  an  nFR  x  nFR  symmetric  matrix  and  „4FR  is  an 
m  x  nFR  matrix  of  rank  m,  then  there  exists  7  <  oo  such  that  the  inertia  of 


I\  = 


'//  +  7/PX  A* 

A  0 


is  (n,  m,  0),  where  II  is  an  n  x  n  matrix  and  A  is  an  m  x  n  matrix  such  that 

II  = 


/fx  = 


Hfk 

"T\ 

ft 

Hfr  J 

•4fx, 

•4fr)j 

(1 

U 

or 

and  \\H\\  4-  jj/l||  <  oo. 
Proof.  We  have 


In(I<)  =  In[Hn  +  7 /)  +  In(K/[Hn  +  7 /))• 
If  7  >  -Amin,  where  Am;n  is  the  smallest  eigenvalue  of  //Fx,  then 


In(HP%  +7 /)  =  (nFX,  0,  0). 


(4.11.2) 


(4.11-3) 


-f 
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DO 


By  definition, 

KRHrx  +  -tl)  =  Kn  -  ^  ^  j  (H„  +  il)-'{UT,  AjJ, 

and  have 

Jim  I!  f^j  (H„  +  /&)  1|  =  0. 

Hence  there  exists  M  <  00  such  that 

In(I</(Hrx  +  7/))  =  /n(  A'fr)  (4.1 1.4) 

for  7  >  M.  It  follows  from  this  result,  (4.11.2)  and  (4.11.3)  that  In(K)  =  (n,  in.  0). 
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We  now  show  that  eventually  fi  can  be  reduced  at  an  arbitrarily  fast  rate.  We 
assume  for  simplicity  that  the  QP  (4.9.1)  has  only  lower  bounds  and  that  these  arc 
zero.  In  the  following  lemma,  x[p)  denotes  the  current  iterate,  ft  the  current  barrier 
parameter,  and  ps  the  next  barrier  parameter  to  be  used  (ps  <  p). 

LEMMA  4.11.4.  Lei  x  be  a  strong  local  minimum  of  OP  at  which  we  have  strict 
complementary  slackness.  Let  x(p)  be  such  that 


In(Ka{x{p),p))  =  (n,  m,  0), 

]»2»II*(/0“**(#0II  =0 

and 

-x*(/i)||  =  0, 

win  IX 

’[a  0 

If  Us  =  8{p)p,  where  $(p)  is  the  smallest  positive  scalar  such  that  Ini  I\B  (Tipups))  = 
(n.  m,  0),  then  lim^^j/?(/i)  =  0. 

Proof.  There  is  no  loss  of  generality  if  we  assume  x*  =  (x*x,x*R).  where  z*x  denotes 
the  variables  on  a  bound  and  x*R  denotes  the  variables  not  on  a  bound.  We  shall  also 
use  the  suffices  to  denote  a  partition  of  //  and  ,4,  as  in  the  preceding  lemma. 

It  follows  from  the  assumptions  on  x*  and  x(p)  that 


lim  p 

ii-0 


/  (  J-FR  C  /* 


f  1.1  I. 
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and 

Wmp/ix^in))]  =  oo. 

Consequently,  there  exists  e  >  0  such  that  if  p  <  e  then  /i/(xFR(/x))j  >  0  for  all  j, 
where  0  <  oo. 

It  follows  from  Lem  ,  a  4.11.3  and  the  assumptions  on  x*  that  there  exists  7  such 
that  In(K^)  =  (n,  m,  0),  where 

H  +  7/fx 
A  0 


By  definition, 


(H  +  Mr  "T  An\ 

h  //„  +  x;}  aJr  , 

\  AFx  Afr  0  ) 


where  A'FX  =  diag(xFX(/i))  and  ATr  =  diag(xFR(/z)).  It  follows  from  (4.11.5)  that 
ln(I\)  =  (nKR,  ?7i,  0),  wheie 


K  = 


L^FR  +  ,5(/0< 


•  *FR 


Therefore,  provided  (3(p)p/xrX{p)2j  >  7  for  all  j, 

Ir.{Kg(x(p),pN))  =  In(K,r)  -  (n,  m,  0). 

Since  fi/xFX(p)j  =  0  for  all  j,  we  have  lim„_o 0(p)  =  0. 
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4.12  A  '  ,’mple  Example 

Consider  solving  the  following  QP  by  BARALG: 

minimize  Q{x,y)  =  [x  +  y  -  3)2  --  (x  -  y  -  2)2 
subject  to 

3x  +  y  —  l  >0 
x  —  y  ■+■  1  >0 
—x  —  y  5  >  0 
-1  +  3?/  +  4  >  C 
x  >  0 
y  >  0. 


4-12.  A  Simple  Example 
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Figure  4.1:  Contours  of  B{x,p)  for  p  =  10 

Figures  4. 1-4.3  show  contours  of  the  logarithmic  barrier  function  for  decreasing 
values  of  the  barrier  parameter:  p  =  10, 1,0.1.  Clearly  for  p  =  10  the  barrier  function 
is  convex.  Although  the  barrier  function  is  not  convex  for  p  =  1  the  minimizer  is  the 
only  stationary  point.  It  follows  tlut  the  first  subproblem  could  have  been  terminated 
at  any  stage.  Even  though  for  /x  =  0.1  the  function  now  has  more  than  one  stationary 
point,  tha  required  minimizer  could  be  found  provided  the  initial  point  was  a  quite 
modest  approximation  to  the  minimizer  for  the  second  subproblem.  Had  the  reduction 
in  p  been  less  severe,  an  even  cruder  approximation  W  the  minimizer  of  the  second 
subproblem  would  suffice. 
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Figure  4.2:  Cont^'”"?  of  B(x,p)  for  p  =  1 


Figure  4.3:  Contours  of  B(x,p)  for  \i  —  0.1 


Chapter  5 

Sensitivity  Analysis 


In  Chapters  3  and  4  we  have  described  a  model  barrier  algorithm  that  theoretically 
converges  to  the  solution  of  a  QP.  To  prove  that  the  algorithm  is  numerically  viable,  we 
must  show  that  the  solution  to  the  subproblems  can  be  computed  in  a  stable  fashion. 
To  this  end,  we  perform  a  detailed  sensitivity  analysis  of  the  I<KT  system  arising  from 
barrier  methods.  A  key  result  of  this  analysis  is  that  the  ill-conditioning  normally 
associated  with  the  Hessian  in  barrier  methods  is  benign  when  the  logarithmic  terms 
are  applied  only  to  simple  bounds.  Thus,  a  stable  barrier  algorithm  is  realized. 

5.1  Larrier  Methods  Applied  to  General  IQP  Problems 

In  general,  unconstrained  minimization  techniques  are  not  well  suited  for  minimiz¬ 
ing  barrier  functions  (see  [Mur71a]).  This  is  primarily  due  to  the  progressive  ill- 
conditioning  of  the  Hessians  as  the  solution  is  approached.  In  this  section  we  briefly 
illustrate  the  structure  of  the  Hessian  and  explain  why  its  condition  number  is  un¬ 
bounded.  In  later  sections  we  prove  that  accurate  subproblem  solutions  can  be  ob¬ 
tained  despite  this  ill-conditioning  when  the  barrier  terms  are  applied  only  to  simple 
bounds. 

Consider  the  general  IQP  problem, 

IQP  minimize  F(x)  =  cTx  +  \xTIIx 

***"  2  (5.1.1) 

subject  to  Ax  >  (3, 

where  the  Hessian  H  is  an  n  x  n  symmetric  matrix,  A  is  anmxn  matrix,  and  c  and 
/?  are  vectors  of  dimension  n  and  m  respectively.  Let  x *  denote  the  minimizer  and 
let  A  be  the  rh  x  n  mot.rix  ssociated  with  the  constraints  that  are  active  at  x*. 

Applying  a  logaritl.  >rrier  function  to  the  inequality  yields  the  following  un¬ 
constrained  minimization  problem: 

m 

minimize  B(x,  fi)  =  cTx  +  \xTIIx  —  ^  \n(ajx  —  /?,),  (5.1.2) 

1=1 

where  aj  refers  to  the  i-th  row  of  A.  By  definition  of  an  unconstrained  minimizer, 
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the  following  relation  holds  when  x  =  x*(p): 


m  a- 

VB  =  c  +  Hx  -  pT~  =  g  -  A1 

jZ I  ri 


i 


ig/rm  J 


=  0, 


(5.1.3) 


where  rt-  =  afx  —  fa.  Thus,  the  gradient  of  F(x)  at  x*(p)  is  a  nonnegative  linear 
combination  of  all  the  constraints.  Further,  if  the  constraint  afx  >  fa  (of  the  original 
problem)  is  not  active  at  x* ,  its  corresponding  coefficient  /x/r,-  must  approach  zero 
(because  r,-  is  bounded  away  from  zero  in  a  neighborhood  of  x*).  On  the  other  hand, 
if  the  constraint  is  active  at  x* ,  it  can  be  shown  that  its  coefficient  approaches  the 
corresponding  Lagrange  multiplier  of  the  original  problem  (assuming  A  has  full  row 
rank).  That  is,  for  sufficiently  small  p  it  follows  from  (5.1.3)  that 

=  +  0{lt) ■ 

To  arrive  at  the  Hessian  of  the  barrier  function,  we  differentiate  (5.1.3): 

>/**? 


Hb  =  H  +  A1 


Hhm- 1 


A. 


We  shall  focus  on  the  values  of  HB  at  x*(p)  as  p  —*  0. 

LEMMA  5.1.1.  Let  Hb  be  the  Hessian  of  the  barrier  function  (5.1.2)  evaluated  at 
x*(p).  Let  A  be  the  fhxn  submatrix  of  A  associated  with  the  constraints  active  at  x* , 
and  assume  A  has  full  row  rank  fh.  If  0  <  fh  <  n,  as  the  solution  of  the  original  prob¬ 
lem  (5.1.1)  is  approached,  the  condition  number  of  the  Hessian  becomes  unbounded. 
Further,  as  p  — *  0,  the  Hessian  has  fh  unbounded  eigenvalues  with  eigenvectors  in  the 
range  of  AT  and  (n  —  fh)  bounded  eigenvalues  with  eigenvectors  in  the  null  space  of 
A. 


Proof.  For  inactive  constraints,  r,-  is  bounded  away  from  zero  and  hence  p / r;  and 
p/r i2  approach  zero  as  x*(u)  approaches  x*.  For  active  constraints,  the  quantity 
p/ri  approaches  the  Lagrange  multiplier.  Thus,  fh  ratios  p/r]  must  be  unbounded 
as  r;  — >  0.  This  implies  that  if  0  <  fh  <  n,  the  dominant  rank-deficient  matrix  causes 
the  condition  number  of  the  Hessian  to  become  unbounded  and  the  result  follows. 

I 

Finally,  we  remark  that  the  ill-conditioning  of  the  Hessian  matrices  of  barrier 
function  does  not  result  from  the  barrier  parameter,  but  from  the  singularities  caused 
by  tue  active  constraints.  (See  [Mur71b]  for  more  details.) 
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5.2  Standard  Formulation  of  QP  Problems 

In  this  thesis,  we  only  consider  the  barrier  method  applied  to  QP  problems  in  standard 
form: 


minimize  Q(x)  =  cFx  +  \xTHx 
*&n  2  (5.2.1) 

subject  to  Ax  =  b,  l  <  x  <  u. 

As  we  illustrate  in  this  chapter,  the  standard  formulation  is  of  fundamental  impor¬ 
tance  to  the  successful  application  of  the  barrier  method.  An  intu:'  ‘  e  understanding 
can  be  obtained  by  examining  the  Hessian  of  the  barrier  funci:o*i.  bor  (5.2.1)  the 
barrier  function  is  given  by  (4.9.2)  and  the  corresponding  Hessian  is  given  by  (4.9.4): 

HB  =  H  +  fiDB.  (5.2.2) 


Notice  that  in  this  special  case  the  effects  of  the  logarithmic  terms  appear  only 
on  the  diagonal  of  the  Hessian.  The  m  singularities  of  Lemma  5.1.1  are  therefore 
separable.  It  is  in  fact  this  decoupling  that  allows  accurate  solutions  to  be  computed. 
In  the  sensitivity  analysis  that  follows,  we  make  specific  use  of  this  particular  form  of 
the  Hessian. 

Let  Kv  =  w  denote  the  I<KT  system  associated  with  the  barrier  subproblem 
arising  from  the  standard  formulation  (5.2.1). 

We  have  said  HB  is  ill-conditioned.  It  is  clear  from  the  nature  of  the  ill-conditioning 
of  Hb  that  K  is  also  ill-conditioned.  We  show  that  v  can  be  computed  reliably 
nevertheless. 

5.3  Standard  Sensitivity  Analysis 

Given  the  KKT  system  Kv  =  w,  our  aim  is  to  examine  how  perturbations  in  I\ 
and  w  alFect  the  solution  v.  In  this  section  we  recall  some  results  from  the  standard 
perturbation  theory  for  solving  linear  systems. 

Let  SK  and  6iv  be  perturbations  for  K  and  w  respectively,  so  that  the  perturbed 
KKT  system  can  be  expressed  as 

(K  +  6K)(v  +  Sv)  =  (w  +  Sw),  (5.3.1) 

which  we  shall  write  as 

Kv  ~  w.  (5.3.2) 

Even  when  K  is  nonsingular,  I\  may  be  singular  if  SI\  is  not  restricted.  It  is  known 
[GLS9]  that  if  SK  is  such  that  ||<5A'||  <  1/||  AT- 1  [|,  then  the  following  hold: 
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•  K  +  SK  is  nonsingular; 

•  (A  +  6K)-1  =  (/  +  K~16K)~1K~1  =  '$%0(K-16K)iK-1; 

•  The  relative  error  in  the  solution  of  (5.3.1)  can  be  bounded  by 


INI  < . 

IMI  —  1  —  cond  (K)p(K) 


cond  (A) 


(p(K)  +  p(w))  > 


(5.3.3) 


where  cond  (A')  =  ||A  1  ]|  ||/^||-  The  quantities  p(K)  =  ||6A'||/||A||  and  p(w )  = 
||6u;||/||m||  constitute  the  relative  changes  in  K  and  w  respectively. 


The  condition  number  cond(A')  attempts  to  measure  the  worst  possible  effect  on 
the  solution  v  when  A  and  w  are  perturbed  by  a  small  amount. 

Unfortunately,  as  we  pointed  out  in  the  previous  section,  for  the  linear  systems  in 
consideration,  cond; A)  — »  oo  as  p  — »  0.  Ironically  the  system  becomes  increasingly 
ill-conditioned  precisely  when  we  are  more  interested  in  solving  it  accurately,  that  is, 
as  we  approach  the  solution. 


5.4  Useful  Identities  and  Inequalities 


A-'-B-1  =A~l{B-A)B~1 

OO 

(/-A)-'  =£A‘ 

i=0 

1  00 

(A  +  iD)-'  =  -£>-•£  (-UV^AD-1)' 

I  i=0 

(A  +  7 D)~l  =  7 -'D'1  -  7_2Z?'MZ?-1  +  0(7"3) 


5.5  Analysis  of  the  KKT  System 


The  KKT  system  for  the  barrier  subproblem  is 


Y) 


A  = 


(ri- 


(5.5.1) 


where  HB  =  H  +  pDB  and  DB  =  D\  +  D\  ( see  Equation  (4.9.3)).  Without  loss  of  gen¬ 
erality  and  for  clarity  of  exposition,  we  will  consider  only  problems  with  nonnegativity 
bounds,  so  that  for  the  remainder  of  this  chapter,  DB  =  diag(l/z2). 

We  are  interested  in  analyzing  the  behavior  of  the  solution  of  system  (5.5.1),  es¬ 
pecially  the  limiting  behavior  as  the  barrier  parameter  tends  to  zero.  For  the  purpose 
of  the  analysis  we  will  consider  a  particular  ordering  of  the  KKT  equations.  The 
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motivation  is  to  distinguish  (or  treat  separately)  those  elements  of  DB  that  become 
unbounded  as  we  approach  the  solution.1  Let  us  assume  that  the  rows  and  columns 
of  D0  have  been  permuted  so  that  the  unbounded  elements  appear  first.  If  we  define 

Q  =  x?  =  minx? 

'  j  3 

7  =  pIP 

dj  =  PI*) 

D  =  diag(dj), 


then 

•  Hb  =  H  +  / lDb  =  H  +  7  D] 

•  0<dj<  1,  and  ||D||  =  1. 


This  ordering  induces  the  following  partition  in  the  matrices  D,  HB  and  K : 


D  = 


Hb  =  H  +  7 


* 


K(i)  = 


(Ht+iD,  Hi  /lf\ 

#21  #2  +  7#2  A%  , 

<  A\  Ai  ) 


where  ~yD\  — ►  oo  as  y.  — >  0,  in  contrast  to  7#2>  which  by  construction  remains 
bounded  as  y  — »  0. 

Without  loss  of  generality  we  can  assume  that  the  original  QP  has  been  scaled  so 
that  ||//||  1  and  \\A\\  1.  Therefore,  the  system  of  interest  (5.5.1)  becomes 


,,,,,,  (h  +  -,dat\ 

K(t )v(i)  =1  1 1>(7)  = 


(5.5.2) 


where  7  is  a  positive  scalar;  K  and  D  are  square  matrices;  I\  is  symmetric,  indefinite 
and  ||  I\  ||~  1;  D  is  diagonal,  positive  semi-definite  and  ||  D  ||=  1.  With  the  chosen 
partitioning  we  also  have  ||Z?i||  =  1  (since  x,  will  be  in  D\).  We  are  interested  in  the 
quantity 

lim  v{-y).  (5.5.3) 

'>—’*00 

'If  there  are  no  unbounded  elements  in  D0,  i.e.,  if  none  of  the  components  of  x  approaches  a 
bound,  standard  sensitivity  analysis  applies. 
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Equation  (5.5.2)  is  equivalent  to 


where 


K i  \  ( 

Kz  K2  )  \t*(7)  / 

/Ci  =  ^+7A, 


(5.5.4) 


K 2  = 


I<3  = 


Hi  +  7D2  A? 


0  )  ' 


Note  that  J\2  is  a  KKT  system  of  smaller  dimension,  namely  the  submatrix  obtained 
from  the  original  system  after  eliminating  variables  that  are  near  their  bounds. 

Let  S  =  K/Ki  be  the  Schur  complement  of  I(  with  respect  to  I(t.  The  system 
(5.5.4)  can  be  written  as 


AW7)  =  wi  ~  Hi v2{ 7), 

Sv  2(7)  =  w2-  I<zI<iXw\. 

In  order  to  find  an  expression  for  (5.5.3),  we  need  to  present  three  lemmas. 


(5.5.5) 

(5.5.6) 


LEMMA  5.5.1.  If  7  is  sufficiently  large,  the  matrix  K 1  is  nonsingular  and  its 
inverse  has  the  form 


Proof: 


Hence, 


*r‘  =  iff’  -  fa's, or' + o(7-3). 


I<i  -  Hi  +  7P1 

=  (/  +  7“,^iA“1)7A. 


(5.5.7) 


/cri=7"1A"1(/+7”1^iVr1- 

If  7  is  sufficiently  large  we  have 


(7  +  r'HtDf1)-1  =  7  -  7~1T717V  +  0(7"2)- 


The  required  result  follows. 

I 


* 

* 
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LEMMA  5.5.2.  If  I(2  is  nonsingular  and  7  is  sufficiently  large,  S  is  nonsingular 
and  its  inverse  can  be  written  as 

S~l  =  I(~l  +  +  0(  7~2). 

Proof:  We  have  from  (4.11.4)  in  Lemma  4.11.3  that  if  7  is  large  enough, 

In(S)  =  In(I(2). 

It  follows  that  for  7  large  enough,  S  is  nonsingular.  In  all  that  follows  we  assume 
that  7  is  always  chosen  suitably  large. 

By  definition  we  have 

5'1  =  (I<2  -  AW)-1 

=  K,'  +  iq'M-l)I<V  +  (5.5.8) 

t=2 

where  AW  =  K^K^Kf.  It  follows  from  Lemma  5.5.1  and  the  definition  of  ^(7) 
that 

-4(7)  =  Kfo-'D;1  -  +  0(7-3)]A'3t 

=  +  Oft-2).  (5.5.9) 

7 

The  required  results  follows  from  substituting  this  expression  for  AW  into  (5.5.8), 

I 

LEMMA  5.5.3.  If  I( 2  and  Dx  are  nonsingular  and  7  >  WD^HiW,  then 
v7(j)  =  K;'w2  +  t'K?K,D?(I<Jk;'w2  -«,,)  +  0(7-2), 

»ift)  =  -£>rV>  -  i<1i<;'w2)  +  Oft-2). 

7 

Proof:  From  (5.5.6)  and  Lemma  5.5.2  we  have 
^2(7)  =  S~1(w2  -  I<zKflw  1) 

=  (Kfl  +  -iqxiuD-xli<Ii<f')(w2  +  A'3/vr1^)  +  o(r2)- 

7 

Substituting  for  A'f 1  from  Lemma  5.5.1  gives  the  first  required  result., 
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Similarly,  from  (5.5.5)  and  Lemma  5.5.1  we  have 

Ml)  =  tffVj  “  KzMl)) 

=  iDr‘(w.-/f,Tt.2(7))  +  0(7-2). 

7 

Substituting  for  v2(7)  gives  the  second  required  result. 

I 

From  the  last  lemma  we  can  derive  the  limiting  value  of  v('f): 


lim  uj(7)  =  0, 

nr— *oo 


lim  Ml)  =  K->  lw2. 

oo 


These  results  indicate  that  when  7  — ►  00  (or  equivalently  when  p  — ►  0)  the  KKT 
system  decouples.  This  is  precisely  what  we  would  obtain  if  we  fixed  the  components 
of  v  that  are  on  a  bound  at  the  solution  and  solve  for  the  reduced  system. 

5.6  Specific  Sensitivity  Analysis 

Here  we  analyze  the  sensitivity  of  the  solution  of  the  KKT  systems  arising  in  barrier 
methods.  It  was  shown  in  the  previous  section  that  these  systems  can  be  expressed 


Hi  +  7 Di  Kj  \  f  ui(7) 

K 3  I<2  J  l  u2(7) 


K 1  Kf 
I<z  I<7 


(5.6.1) 


where  I(\  =  H\  +  7D1 . 

In  the  following  analysis  it  is  tacitly  assumed  that  ||v(7)||  >  0  for  all  7  sufficiently 
large.  We  wish  to  address  how  sensitive  the  solution  of  (5.6.1)  is  to  perturbations  in 
the  matrix  K  and  the  vector  w.  Thus  if 


( K  +  SK)(v  +  Sv)  =  (w  +  Stv), 


(5.6.2) 


where 


SK  = 


SHi  +  1W1  81<l 


and  Sv  = 


we  are  interested  in  how  the  relative  and  absolute  perturbations  behave  when  7  — >  00. 
Since  lim7_00  ||ni(7)||  =  linxy-.^  J|vi (7)  +  <5vi(7)||  =  0>  our  interest  is  in  the  behavior 
of  the  perturbation  Sv2. 
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We  use  the  Schur  complement,  so  that  for  the  original  KKT  system  we  have 

Ktvt  =wi-  I<J v2, 

Sv2  =  w2  —  K3K^1w1,  (5.6.3) 

and  for  the  perturbed  KKT  system 

=W\~  I<Jih, 

Sv  2  =  w2  —  KzKy'wi,  (5.6.4) 


where 

=  vi  +  Sv  i, 
v2  =  v2  +  Sv2 , 

A  i  =  Ki  +  SH\  +  ^SD\ , 

K2  =  I<2  +  SI<2, 

A3  =  A3  +  $  A3, 

S  =K2-KzKyXKl. 

It  follows  from  Lemma  5.5.2  and  the  nonsingularity  of  I\2  (a  consequence  of  the 
nonsingularity  of  K)  that 

lim  5  =  A'2, 

'7— CO 

lim  S  =  K2  +  SI\2. 

■y— 00 

Also  from  Lemma  5.5.1  we  obtain 

lim  A'T1  =  lim  A',-1  =  0. 

7— ‘CO  *  7“*O0 

In  the  limit  (allowing  7  — +  oc),  we  therefore  have  from  (5.6.3)  and  (5.6.4)  that 


K2v2  =  w2, 

(A2  4"  S A 2 ) ( v-j  -j-  <5u2)  —  iu2  d*  Su-2. 


We  can  now  apply  the  classical  analysis  of  perturbed  systems  to  obtain 


ilfa2 1|  <  cond(A'2) 

IMi  ”  1  -cond(A'2)p(A'2) 


(p(  K2)  +  p{iv  2)). 


These  results  show  that  the  sensitivity  of  systems  involving  A'  becomes  the  same 
as  the  sensitivity  of  the  systems  involving  I\2  as  7  — »  oc. 
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5.7  An  Example 

The  analysis  demonstrates  that  provided  the  problem  is  in  standard  form,  the  sensi¬ 
tivity  of  the  solution  to  the  KKT  system  arising  from  the  barrier  subproblem  depends 
in  the  limit  on  the  condition  of  the  underlying  KKT  system  for  the  optimal  active 
set  for  the  original  problem  and  not  on  the  condition  of  K.  Here  we  demonstrate 
by  a  counterexample  that  this  is  not  the  case  for  problems  in  nonstandard  form.  We 
contrast  the  relative  errors  made  when  solving  two  specific  systems  of  equations;  one 
of  the  type  arising  from  problems  in  nonstandard  form  and  the  other  of  the  type 
arising  from  problems  in  standard  form. 

Suppose  the  QP  problem  has  some  general  inequalities,  i.e.. 


minimize  Q(x)  =  cTx  +  \xTHx 
rest*  '  2 


subject  to 


Ax  =  b , 
Ax  >  b. 


(5.7.1) 


When  a  barrier  transformation  is  applied,  the  subproblem  is  of  the  form 


minirnize  B(x,fi)  =  cTx  +  \xTHx  —  /i  ^In  r,(x) 


subject  to 


Ax  =  fc, 


(5.7.2) 


where  r,(x)  =  d,Tx  —  6,. 

Superficially  this  subproblem  looks  identical  to  the  subproblem  for  the  standard- 
form  QP  (4.9.2).  The  key  difference  is  the  form  of  the  Hessian  of  the  two  barrier 
functions.  In  the  standard-form  case,  HB  has  the  form 

Hb  =  //  +  7  D, 

where  D  is  a  diagonal  matrix  satisfying  0  <  ||Z?|[  <  oc.  In  the  nonstandard  form  the 
Hessian  is  of  the  form 

I  In  =H+  1ArDA , 

where  D  is  also  a  diagonal  matrix.  In  both  cases  we  have 

lim  7  =  oc. 


In  the  standard-form  case  there  is  a  loss  of  figures  in  some  of  the  diagonal  elements 
of  H  when  II B  is  computed  in  finite  precision.  The  previous  analysis  shows  that  this 
is  not  of  significance.  For  the  nonstandard  form  there  is  a  loss  of  figures  in  all  the 
elements  of  H  for  which  AtA  has  a  nonzero  element  (which  could  be  all  elements  of 
II).  Eventually,  loss  of  figures  in  the  off-diagonal  elements  of  II  becomes  catastrophic. 
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7 

eB 

1 

1.5e-15 

105 

2.4e-ll 

1010 

1.4e-6 

1015 

2.1e-l 

Table  5.1:  Relative  error  for  systems  arising  from  problems  in  nonstandard  form. 


To  illustrate,  we  first  consider  systems  of  the  form 


(H  +  7Bt£)x  =  d. 


(5.7.3) 


where  H  is  an  n  x  n  symmetric  orthogonal  matrix  and  B  is  an  m  x  n  matrix  with 
m  <  n.  The  matrix  B  (like  //)  has  a  condition  number  of  one,  so  that  errors  ir.  the 
computed  solution  arise  from  the  form  of  equation  (5.7.3).  For  7  ,  the  condition 

number  of  II  +  7 BTB  is  approximately  equal  to  7.  The  specific  choices  of  //,  B  and 
d  are 

r -0.4179  0.4528  0.2045  0.7606  \ 

_  0.4528  0.8554  -0.0653  -0.2429 

0.2045  -0.0653  0.9705  -0.1097  ’ 
k  0.7606  -0.2429  -0.1097  0.5920 , 


l  —0.3844  -0.4856  0.2404  0.7474 

\  0.6325  0.3162  -0.3162  0.6325 


and 


f  4.3 ' 
1.3 
-1.7 
t  0.5; 


We  solved  (5.7.3)  for  a  variety  of  values  of  7,  using  MATLAB  {MLB87]  on  a  DEC 
VAX  workstation  with  approximately  16  decimals  of  precision.  Let  x  denote  the 
computed  solution  an*'  eB  =  J|x  —  x||/|Jx||  its  relative  error. 

It  is  clear  from  Tab  *  5.1  that  the  loss  of  figures  in  the  computed  solution  to 
systems  of  the  form  (5.7.\  :s  close  to  the  bound  predicted  by  the  standard  sensitivity 
analysis. 


We  now  consider  equals.  •>f  the  form 
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7 

eD 

1 

1 .6e- 15 

10s 

6.3e-17 

1010 

1.5e-25 

1015 

1.4e-16 

Table  5.2:  Relative  error  for  systems  arising  from  problems  in  standard  form. 

where  D  is  a  diagonal  matrix  of  rank  m  <  n.  Specifically  we  use  the  same  matrix  H 
of  the  previous  example  and  choose  D  =  diag(l,  1,0,0).  Let  x  denote  the  computed 
solution  and  eD  =  ||x  —  x||/||x||  its  relative  error. 

According  to  the  special  sensitivity  analysis,  the  loss  of  Hgures  in  the  solution 
should  not  depend  on  the  condition  number  of  H  +  7 D  but  on  the  condition  of 

_  /  0.9705  -0.1097  \ 

2  \ -0.1097  0.5920/ 

as  7  — >  00.  This  matrix  has  a  condition  number  of  only  1.8.  It  can  be  seen  from 
Table  5.2  that  as  predicted  the  errors  are  indeed  very  small. 


Chapter  6 

Special  QP  Problems 


6.1  Introduction 

In  this  chapter  we  discuss  various  special  cases  of  QP.  Specifically  we  consider  the  case 
when  ZTHZ  is  positive  semidefinite  and  vaiious  special  forms  of  H.  Much  of  what 
follows  in  this  chapter  is  also  applicable  to  the  general  case  when  the  circumstances 
considered  hold  locally.  A  key  property  when  ZTHZ  is  positive  semidefinite  is  that 
the  minimizer  of  the  barrier  subproblem  is  unique.  It  follows  that  there  exists  a  single 
constrained  stationary  point.  Consequently  we  may  apply  Newton’s  method  directly 
to  the  nonlinear  equations  that  define  the  stationary  point. 

It  is  often  possible  to  take  advantage  of  specific  structure  in  H  when  solving  the 
KKT  system  of  equations.  We  are  particularly  interested  in  the  case  when  H  is 
trivially  invert; ole.  The  last  special  case  considered  is  when  the  only  constraints  are 
simple  bounds. 

Throughout  Sections  6. 2-6. 8  we  assume  that  ZJHZ  is  positive  semidefinite 


6.2  A  Primal  Method 


Let  us  recall  the  primal  barrier  subproblem 


n 

minimize  B(x.u)  =  cTx  +  \xTHx  —  uY'\nx1 
r€3f"  2  f-J  } 

3=  1 

subject  to  Ax  =  6, 

and  the  associated  KKT  equations 


/ H b  Ar\  /  Ax\  (gB  -  Attt\ 
y  A  0  J  ^ -A7r J  \  Ax-b  J 


(6.2.1) 


(6.2.2) 


If  r/JHZ  is  positive  semidefinite,  the  matiix  ZrHBZ  is  positive  definite  and  the  barrier 
subproblem  has  a  unique  solution  corresponding  to  a  unique  constrained  stationary 
point. 
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The  equations  defining  a  constrained  stationary  point  of  (6.2.1)  are: 

c  +  Hx  —  pX~xe  —  ATn  =  0 
Ax  —  b  =0. 

Defining  2  =  pX~xe  gives 


f  2  —  p.X~xe  ^ 
fP(z,x,i r)  =  c+Hx  —  z  —  At 7t  =  0. 
1  Ax  —  b  1 


(6.2.3) 


The  vector  v  will  denote  the  vector  (2,  x,  —n).  The  Newton  direction  Av  =  (A 2,  Ax,  —An) 
satisfies  the  linear  system 

JpAv  =  (6.2.4) 


where 


l  I  pX~2  0  \ 
Jr,  =  -/  H  AT  . 


Apart  from  the  last  block  of  columns  being  multiplied  by  —  1,  Jp  is  the  Jacobian  of  fp. 
We  shall  refer  to  Jp  as  the  Jacobian.  For  the  Jacobian  to  be  well  defined  we  assume 
x  >  0,  It  is  easily  seen  that  if  a  step  a  were  taken  such  that  any  Xj  +  aAxj  —  0  for 
some  j,  then  ||/p||2  =  fpfp  —  00.  Therefore  there  exists  a  minimizer  of 

minimize  {\\fp{v  +  /?Au)||2  |  /?>0}, 

say  /?,  such  that  x  +  $Ax  >  0  if  x  >  0. 

The  next  iterate  Vk+\  is  given  by  ?;*+1  =  w^  +  aAr,  where  Vk  =  ( 2<-,xjt ,  —  tt*)  and  a 
is  chosen  to  ensure  a  sufficient  decrease  in  ||/p||2.  Specifically  we  choose  a  to  satisfy 
the  Goldstein-Armiio  conditions: 

0  <  — 271aAvTjjf fp(vk)  <  ||/P(t/fc)||2  -  ||/P(’>*+i)l|2  <  -2^2oiAvTjJ fp{vk), 

where  71  and  72  are  scalars  satisfying  0  <  71  <  72  <  1.  Since  JpAv  =  — fp(vk )  the 
conditions  rcuuce  to 


0  <  27l<*  <  1  - 


11/pK+i)1 

ll/PK)ll5 


<  272a. 


The  trial  values  of  the  steplength  are  defined  in  terms  of  an  initial  step  = 
min(l,  0am)  and  a  positive  scalar  w  (w  <  1),  where  am  is  the  largest  feasible  step 


6.2.  A  Primal  Method 


73 


along  Av  and  0  <  0  <  1.  Typically  6  =  .99  and  w  ~  0.5.  The  value  of  a  is  taken 
as  the  first  member  of  the  sequence  {uPa^},  j  =  0,  1,  . . . ,  for  which  the  conditions 
are  satisfied.  It  is  because  /?  exists  that  we  can  assert  that  there  are  suitable  points 
satisfying  the  Goldstein-Armijo  conditions. 

Given  that  the  initial  value  of  v,  say  n0,  satisfies  ||/p(t>0)||  <  oo,  it  follows  that 
there  exists  M  <  oo  such  that  (xk)j  >  Mf\i  for  all  k.  Consequently,  ||JP||  is  bounded 
and  Jp  has  a  bounded  condition  number  at  ail  iterates.  Hence  Newton’s  method 
is  well  defined.  It  follows  from  standard  convergence  analysis  that  lim<;_0OUfc  = 
(z*(n),x*(fi),-ir*(n)). 

The  vectors  Ax  and  A?r  are  identical  to  those  defined  by  the  KKT  equations 
(6.2.2).  Indeed  if  Az  is  eliminated  from  (6.2.4)  we  also  eliminate  z ,  giving  the  KKT 
equations 

(H  +  fxX-2  AtW  Ax\  _  (c+Hx-fiX-1 

^  A  0  J  y  —  Air  j  y  Ax  —  b 

Newton’s  method  applied  to  the  nonlinear  equations  (6.2.3)  differs  from  the  preceding 
definition  of  Newton’s  method  (applied  to  the  barrier  subproblem)  in  that  Ax  and  a 
are  used  to  ensure  a  sufficient  decrease  in  ||/p||  rather  than  B(x,fi).  An  advantage  of 
the  new  formulation  is  that  ||/p||  is  easier  to  compute  than  J E?(x,/i). 

By  utilizing  the  Schur  complement  of  H  +  fiX~2  in  the  KKT  matrix  we  obtain 
equations  in  just  Air,  namely 

A(II  +  /iA'-2)-MtAtt  =  b  -  Ax  +  A(H  +  nX~2Y\gB  -  ATx),  (6.2.6) 

where  cjg  =  c+  Hx  —  / iX~1e .  In  contrast  to  the  linear  programming  case,  these  equa¬ 
tions  do  not  appear  to  be  an  attractive  option  for  computing  Att  unless  H  happens 
to  be  diagonal  (cf.  [CLMS90]).  In  Section  6.7  we  show  that  H  can  be  made  diagonal 
for  any  semidefinite  QP. 

It  is  clear  from  (6.2.5)  that  Ax  is  independent  of  2.  A  closer  inspection  of  (6.2.5) 
reveals  that  Ax  is  also  independent  of  n.  It  is  therefore  unnecessary  to  take  the  same 
step  in  the  x  variable^  as  that  taken  in  z  and  tc. 

We  could  define  the  iteration  as  follows: 


Xk+ 1  =  xjt  +  aAx 

Zk+i  =  Zk  <*zAz  (6.2.7) 

tffc+l  =  -Kk  +  a*A7T, 

where  a  is  chosen  as  before,  but  az  is  chosen  to  ensure  Zk+\  >  0  if  Zk  >  0.  The  intent 
is  to  choose  zq  >  0,  which  then  ensures  zk  >  0  for  all  k. 
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In  addition  to  the  sequence  {a:*},  we  may  recur  the  sequence  {xk}  as  follows: 


Xq  — 


Xk+I  =  Xk  +  azAxk, 


where 


Ax k  =  xk  +  Axk  -  xk. 

We  could  choose  az  to  be  the  maximum  feasible  step  (subject  to  it  being  less  than 
unity).  However,  since  we  know  at  the  required  solution  z*  >  0,  we  choose  instead 
to  take  some  step  that  ensures  zk+i  >  0.  We  have  yet  to  formulate  the  best  rules  for 
choosing  az.  Our  purpose  here  is  to  note  that  we  do  have  the  choice.  Our  reason  for 
choosing  az  different  from  a  is  that  we  are  able  to  generate  a  dual  feasible  point  and 
maintain  such  a  point,  as  the  following  lemma  proves. 


Lemma  6.2.1.  If  at  the  K-th  iteration  we  have 

c  +  Hxk  —  z,(  —  ATnK  =  0 
Axt<  =  b, 


(6.2.8) 


it  follows  that 

c  -f  Hxk  —  zk  —  ATirk  =  0 
Axk  =  b 

for  k  >  I(. 

Proof.  It  is  enough  to  show  that  the  result  holds  for  k  =  K  +  1.  By  definition  we 
have 

c  -h  H(xk  +  Axk)  -  (zK  +  Azk)  -  AT(wh-  +  Attk)  =  0 

A{xk  +  Axk)  =  b. 

It  follows  from  these  equations  and  the  assumptions  that 

H( xK  +  Axk  —  xK)  -  Azk  -  AtA*k  =  0 
A(xk  +  Axk  -  xK)  =  0. 

Since  AxK  =  xK  +  AxK  —  xK  we  have 

HAxk  —  A  zK  —  AtA/kk  =  0 
AAxk  =  0. 


It  follows  that 


c  +  Hxk+  ,  -  zK+ ,  -  ATirK+i  =  0 


r 
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An  immediate  consequence  of  this  lemma  is  that  if  ever  a.  =  1 ,  then  all  subsequent 
elements  of  the  sequence  {xk,Zk,Ttk}  are  dual  feasible  (even  if  az  ^  a  at  some  stage). 
At  such  points  we  are  able  to  compute  a  lower  bound  on  the  objective  function  at  the 
solution.  If  ever  a  —  1,  then  all  subsequent  elements  of  are  primal  feasible  and 
we  are  able  to  compute  an  upper  bound  on  the  objective  function  at  the  solution. 
Once  a  primal  and  dual  feasible  solution  are  known,  then  at  every  subsequent  iteration 
we  may  compute  the  duality  gap  given  by 

cTxk  +  \xTkHxk  -  bT nk  +  \xTkHxk, 
which  when  xk  =  xk  becomes 


rp  rp  rp 

c  xk  +  xkHxk  —  b  irk. 

Knowledge  of  the  duality  gap  may  be  used  to  estimate  a  suitable  change  to  the 
value  of  /i  and  as  an  overall  termination  criterion. 

Since  the  sequence  {xfc}  is  unaffected  by  choosing  az  ^  a,  and  since  a  step 
bounded  away  from  2ero  may  still  be  taken,  it  follows  that  we  still  have  lim^oo  xk  = 
However,  it  is  necessary  to  show  that  limi-^oo  zk  =  z*{/t)  and  linu-_0O7r;t  = 

**(/*)■ 

LEMMA  6.2.2.  If  the  sequences  {zk}  and  {t*}  are  generated  according  to  the  above 
rules,  then  lim*-^  zk  =  x*(/r)  and  lim^-oo  * k  —  7T*(/0- 

Proof.  By  definition  we  have 


Azk  -  ftX-k  2Xxk  =  -zk  +  jiX;  'e. 

Hence 

zk  +  A  zk  =  fiXfle  +  gXf2Axk. 

Since  there  exists  e  such  that  ( xk)j  >  e  >  0,  and  since  limjt_^  Axk  =  0,  it  follows 
that 

lim  zk  +  Azk  =  lim  gXf'e  =  z*{(i). 

k—oo  k—*  oo 

It  also  follows  that  there  exists  I\  such  that  az  =  1  for  all  k  >  I\.  Hence  limjt-.^  zk  = 
**(/*)■ 

Since  lim^^  Xxk  —  0,  there  exists  K  such  that  a  =  cu  =  1  for  all  k  >  K . 
Consequently,  for  k  >  K  we  have  xk  =  xk.  It  follows  from  tiiis  result  and  from 
Lemma  6.2.1  that  for  k  >  I\, 


c  +  II xk  -  zk  —  /IT7rjt  =  0. 
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Since  lim*-^  z*  =  z*(/i)  and  limjt_0Oa;fc  —  it  follows  that  limi_00  7rfc  =  n*(p). 

I 

Since  a  unit  step  in  all  the  variables  is  assured  after  a  finite  number  of  iterations, 
it  follows  that  eventually  both  a  primal  and  dual  feasible  point  will  be  identified 
regardless  of  the  value  of  p. 


6.3  A  Dual  Method 


Let  z  =  c  +  Hx  —  ATir.  The  dual  barrier  subproblem  can  then  be  written  as 


minimize  B(x,  n,  z,  p)  =  —  bTn  +  t:xtHx  —  pY'  In  z, 

*’*■*  2  U  (6.3.1) 

subject  to  —  z  +  c  +  Hx  —  A1 k  =  0. 

A  stationary  point  of  (6.3.1)  satisfies  the  equations 


fd{z,x,ir)  = 


(  x  —  pZ  *e  \ 
c  +  Hx  —  z  —  At it 


=  0. 


V  Ax  -  b  J 


(6.3.2) 


This  system  is  equivalent  to  (6.2.3)  in  the  sense  that  the  solution  to  both  systems 
with  x*(p)  >  0  is  identical.  We  could  have  arrived  at  (6.3.2)  by  multiplying  the  first 
n  equations  of  fp  in  (6.2.3)  by  XZ~l.  The  Jacobian  of  fd  is  given  by 


Jd  = 


( pZ~ 2  I  0  \ 
-/  H  At 

[  0  A  0  J 


If  we  apply  Newton’s  method  to  (6.3.2)  then  the  following  system  is  solved  at  each 
iteration: 


f  A  z^\ 

^  x  —  pZ~le  N 

Jd 

Ax 

=  — 

c  +  Hx  —  z  —  AT7T 

\  Ax  -  b  j 

Observe  that  ||/d||  =  oo  whenever  an  element  of  z  is  zero.  The  next  iterate  is  given 
by  taking  a  step  a  along  (Az,  Ax,  Air),  and  we  now  require  z  to  remain  positive. 
The  proof  that  the  sequence  converges  to  the  required  solution  follows  an  identical 
argument  to  that  given  for  the  primal  method. 

The  corresponding  KKT  system  is 


6.3.  A  Dual  Method 
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As  in  the  primal  case  we  can  obtain  equations  in  just  At: 

AiH  +  -Z2)-lAJ Att  =  b- Ax  +  A{H  +  -Z7)-1(c+Hx-ATir-2z  +  -Z2xJ.  (6.3.5) 

n  n  n 

Since  x  appears  linearly  in  (6.3.2)  it  follows  that  x  =  x  +  Ax  is  independent  of  x. 
Rearranging  (6.3.4)  gives 

Hence,  Att  is  independent  of  x.  We  have  from  (6.3.3)  that 

[iZ~2Az  =  —x  —  Ax  +  fiZ_1e  =  —x  +  fiZ~le. 

Hence,  A z  is  also  independent  of  x  and  there  is  no  need  to  take  the  same  step  along 
Az  and  A^  as  Ax.  We  are  therefore  in  an  analogous  situation  to  the  primal  case, 
and  we  define  the  iterative  sequence  as 


Xk+i  =  Xk  +  or  Ax 
2*4-1  =  2*  +  cczAz 
TTfc+l  =  5Tfc  +  QzAtT. 

Again,  we  mav  ■  ecur  an  additional  variable  X*  according  to 


*o  —  xo 


M  _  w  t  rt.  A  <*t 

Hfe+l  “  ~T 


with 

Ax*  =  x*  +  Ax*  -  x*. 

Exactly  as  in  the  primal  case,  if  az  =  1  at  iteration  A\  then  (x*,x*,7r*)  are  dual 
feasible  for  k  >  I\.  Similarly  if  or  =  1  at  iteration  A",  then  x*  is  primal  feasible  for 
k  >  K. 

A  similar  convergence  analysis  to  the  ■'rimal  case  shows  that 

lim  x*  — ■  lim  x*  =  x*(/») 

k—oo  fc— oo 

lim  zk  =  z*{/t) 

OO 

lim  7r*  =  z  ( fi ). 

K — »00 
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6.4  A  Primal-Dual  Method 

If  we  prcmultiply  the  first  n  elements  of  fp  in  (6.2.3)  by  A'  we  obtain 


Zx  —  pe  =  0. 


Consequently,  the  solution  of  (6.3.2)  such  that  z  >  0  is  also  the  solution  of  the 
following  system  of  equations: 


fPd(z,  x,  rc) 


fZx  — pe  \ 
c  +  Hx  —  z  —  ATn 


=  0. 


\  Ax  —  b  ) 


(6.4.1) 


These  are  termed  primal-dual  equations  not  because  they  can  be  shown  to  arise  from 
a  primal-dual  form  of  LP,  but  simply  because  it  is  now  desirable  that  both  x  and  2 
remain  strictly  positive.  The  Jacobian  of  fpd  is  given  by 


Jpd  — 


lx  z  0  \ 

-7  H  AT 
V  0  A  0  > 


If  we  apply  Newton’s  method  to  (6.4.1),  the  following  system  is  solved  at  each  itera¬ 
tion: 


(6.4.2) 


(  A  z^ 

l  Zx — pe  ^ 

Jpd 

Ax 

=  —  c  +  Hx  —  z  —  ATr 

^  Att  j 

\  Ax  —  b  ) 

The  corresponding  KKT  system  is 
H  +  X-lZAT\(  AxN 


0 


—  Atf 


c  +  Hx  -  pX  le  —  At~ 
Ax  —  b 


(6.4.3) 


As  in  the  primal  and  dual  cases,  we  may  obtain  equations  in  just  A>r: 


A(H  +  X~lZ)-lAT&*  =  b  -  Ax  +  A(H  +  X~'Z)~l(gB  -  AT x).  (6.4.4) 

A  proof  of  convergence  is  no  longer  obvious  since  it  is  no  longer  transparent  that 
restricting  x*  >  0  and  z*  >  0  will  not  inhibit  convergence.  If  we  introduce  a  separate 
step  for  z  and  tt  as  before,  we  can  show  that  the  resulting  sequence  converges  to 
the  desired  values  provided  the  step  for  x  is  chosen  as  in  the  primal  method.  To 
appreciate  why,  observe  that  the  right-hand  side  of  (6.4.3)  and  (6.2.5)  are  identical. 


6.5.  A  Second  Primal-Dual  Method 
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Assume  for  the  moment  that  x k  is  feasible;  then  the  right-hand  side  of  both  systems 
is  (ATn  —  gB>0).  Let  V  denote  a  basis  for  the  null  space  of  the  columns  of  A,  i.e., 
AV  =  0.  Since  AAx*  =  0,  it  follows  that 

Aifc  =  VAxv, 


where 

Vt(H  +  X~xZ)VAxv  =  -VTgB. 

Premultiplying  this  equation  by  Ax£  gives 

A x\gB  =  A  xTvVTgB  =  -A  xTvVT(U  +  X~lZ)VAxv. 

Since  the  largest  eigenvalue  of  VT(H +X~l Z)V  is  bounded  and  the  smallest  eigenvalue 
is  bounded  away  from  zero  for  all  k,  it  follows  that  Ax  is  a  direction  of  sufficient 
descent  of  B(x,  g).  Given  that  we  choose  a  to  satisfy  the  Goldstein- Armijo  condition, 
it  follows  from  standard  convergence  analysis  that  lim*— <*,  H<7a||  =  0.  The  convergence 
of  {2/..}  and  {*-*}  follows  from  Lemma  6.2.2. 

We  have  assumed  that  xk  is  feasible.  A  similar  analysis  based  on  the  use  of  a  merit 
function  such  as  M(x)  =  B(x,g)  +  p||Ax  —  6||j  makes  this  assumption  unnecessary. 

6.5  A  Second  Primal-Dual  Method 

If  we  premultiply  the  first  n  equations  of  (6.2.3)  by  we  get 

X~tZ~ie  —  — e  =  0. 

Consequently,  the  solution  of  (6.2.3)  such  that  x  >  0  is  also  the  solution  of  the 
following  sytern  of  equations: 


fpdi^z  ?*) 


/  X-'Z-'e-le  \ 

c+  II  x  -  z  —  JPFtc 


=  0. 


Ax-b  ) 


(6.5.1) 


The  Jacobian  of  fpd  is  given  by 


/  -Z~2X~l  ~X~2Z~l  0  \ 
-I  II  AT 

K  0  A  0  > 
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If  we  apply  Newton’s  method  to  (6.5.1),  the  following  system  is  solved  at  each  itera¬ 
tion: 

/  Az\  /  X~xZ~xe  —  -e  \ 

J  pd  Ax  =  —  c  +  Hx  —  z  —  i4r7r  .  (6.5.2) 

\-An  )  \  Ax-b 

The  corresponding  KKT  system  is 

(h  +  X-'Z  AT\  /  Ax\  _  _  (c  +  Hx-2z  +  ~Z2X~  ATn 

y  A  0  J  y  —Air  J  y  Ax  —  b 

Note  the  matrix  on  the  right-hand  side  of  6.5.3  is  identical  to  that  of  6.4.4.  Again, 
using  the  Schur  complement  of  H  +  X~lZ ,  we  obtain  equations  in  An: 

A{H+X~xZ)-xATAn  =  b-Ax+A{H+X-xZ)-x{c+Hx-2z+^Z7x-ATn).  (6.5.4) 

In  this  primal-dual  method  a  proof  of  convergence  is  quite  straightforward.  We 
now  have  ||/pd||  =  oo  if  any  element  of  x  or  z  is  zero.  Consequently,  taking  a  step  that 
satisfies  the  Goldstein-Armijo  conditions  ensures  that  the  smallest  singular  value  of 
Jpd  is  bounded  away  from  zero  and  that  the  largest  singular  value  is  bounded.  Unlike 
the  three  previous  methods,  it  is  not  apparent  that  the  step  taken  in  the  x  variables 
may  be  different  from  that  taken  in  z  and  n. 


6.6  Affinj  Variants 


In  the  LP  case  it  can  be  shown  that  the  vector 


P  ~ 


lim 

*o 


Ax(p) 

IIAxMij 


exists.  Algorithms  based  on  using  p  as  the  search  direction  have  been  advocated 
by  a  number  of  researchers.  A  proof  that  such  an  algorithm  converges  is  given  by 
Vanderbei  and  Lagarias  [VL90].  It  is  of  some  interest  therefore  to  consider  whether 
similar  algorithms  exist  for  the  quadratic  case.  In  this  section  we  assume  H  is  positive 
definite. 

In  the  case  of  the  primal  method,  simple  inspection  of  the  equations  (6.2.5)  shows 
that  if  H  is  nonsingular, 


(hAt\(  p  \  { c  +  Hx  -  ATn  \ 

\/i  0  )  y  — An )  ~  P  [  Ax-b 


(6.6.1) 
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where  j3  is  a  positive  constant  chosen  to  ensure  ||p||  =  1.  The  search  direction  in  the  x 
variables  always  points  towards  the  constrained  stationary  point  of  the  QP  problem 
with  no  bounds.  Obviously  such  an  algorithm  will  not  in  general  converge  to  the 
desired  solution. 

In  the  case  of  the  dual  method  we  get 


(6.6.2) 


Clearly  any  search  direction  that  does  not  involve  the  Hessian  of  the  quadratic  is  not 
likely  to  be  sensible.  There  is  somewhat  more  hope  that  the  primal-dual  methods 
will  lead  to  sensible  search  directions  since  /t  only  occurs  on  the  right-hand  side  of  the 
Newton  equations.  For  the  first  primal-dual  method  the  corresponding  KKT  system 
is 


H  +  X~1ZAt\(  p 
A  0  /  V  -Air 


=  -/? 


C  +  II X  —  AT7T 
Ax  —  b 


(6.6.3) 


which  reduces  to  the  usual  primal-dual  affine  algorithm  for  LP  when  II  =  0.  Even 
though  the  right-hand  side  c  +  IJx  —  AT~  does  not  tend  to  zero,  such  a  method 
converges,  though  more  slowly  than  when  /t  is  smoothly  reduced. 

The  second  primal-dual  method  reduces  to 


I II  +X~'Z  A7 

l  A  0 
\ 


(6.6.4) 


and  in  this  case  the  right-hand  side  does  tend  to  zero.  It  is  not  immediately  apparent 
that  this  gives  a  sensible  direction.  However,  there  is  no  prima  facie  case  to  dismiss 
it.. 


6.7  The  Case  When  II  +  D  is  Trivially  Invertible 

If  H  +  D  is  trivially  invertible  (where  D  is  a  diagonal  matrix),  we  may  obtain  At*  by 
first  obtaining  As-  from  equations  (6.2.6),  (6.3.5),  (6.4.4)  or  (6.5.4).  Such  an  approach 
is  popular  in  the  LP  case,  in  spite  of  the  fact  that  the  Schur  complements  ADA7  (for 
various  diagonal  D)  arc  much  more  ill-conditioned  than  the  KKT  systems  themselves. 
The  approach  is  particularly  attractive  if  m  <§[  n. 

A  number  of  important  QP  problems  have  an  //  that  is  of  the  required  structure 
or  may  be  converted  to  such  a  problem.  F.xamplcs  of  two  such  objective  functions 
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and 

min  %  xtM  x  +  \xTVVTx, 

where  M  is  a  positive  definite  diagonal  matrix  and  V  is  an  n  x  t  matrix  with  t  «  n. 
In  the  second  case  it  may  appear  that  H  =  M  +  VVT  is  not  trivially  invertible. 
However  if  we  introduce  the  variable  y  =  VTx,  the  problem  is  transformed  into  one 
whose  Hessian  is  diagonal,  at  the  cost  of  adding  i  variables  and  t  general  constraints 
to  the  problem. 

Free  Variables. 

We  may  like  to  use  this  approach  when  H  is  only  positive  semidefinite.  Under 
such  circumstances  H  +  D  will  normally  be  positive  definite.  However,  if  there  are 
some  free  variables  it  may  be  that  H  +  D  is  singular.  We  shall  consider  just  the  primal 
algorithm  but  the  approach  we  advocate  may  be  used  in  all  the  methods.  Suppose 
that  just  ij  is  a  free  variable.  In  place  of  {fp)j  =  Zj  —  fi/xj  we  now  have  [fP)j  —  zr 
Provided  the  KKT  system  is  solved,  the  effect  this  equation  has  on  the  Jacobian  is 
of  no  consequence.  However,  in  the  KKT  system  in  place  of  //  +  ^_Y“2  we  now  have 
//  +  D.  where  dj  =  0  and  d,  =  //if2  for  i  ^  j.  Since  H  +  D  may  be  singular,  its  Schur 
complement  may  not  exist.  A  means  of  circumventing  this  difficulty  is  to  replace  the 
equation  Zj  —  0  by 

ez>Zj  =  //. 

We  then  have  d,  =  zj.  Since  z*(p)  =  at  the  solution  we  may  keep  z}>  t  >  0. 

It  follows  that  //  +  D  is  nonsingular  and  its  Schur  complement  exists. 

6.8  No  General  Constraints 

If  there  are  no  general  constraints,  the  barrier  subproblem  is  an  unconstrained  min¬ 
imization  problem.  A  problem  of  considerable  interest  in  this  category  is  bound- 
constrained  linear  least  squares.  For  the  primal  method  the  subproblem  is 

minimize  B(x,p)  =  cTx  +  \xTHx  —  p  ^  In  x}.  (6.8.  i) 

J 

Even  if  H  is  indefinite,  such  a  problem  may  be  solved  by  a  modified  Newton  method 
based  on  a  Cholcsky  factorization.  For  example,  see  Forsgren,  Gill  and  Murray 
[FGMSOaj. 

When  H  is  positive  definite  as  in  the  least-squares  case,  the  equivalent  reduction 
of  equations  (6.3.2),  (6.4.1)  and  (6.5.1)  is  also  possible.  Note  that  when  there  are 
only  simple  bounds,  degeneracy  cannot  arise  in  the  primal  or  the  dual. 
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6.9  Combination  Methods 

Any  linear  combination  of  the  four  sets  of  equations  (6  2-3;,  (6.3.2),  (6.4.1)  and  (6-5.1) 
may  also  be  used  to  define  a  method.  If  the  linear  combination  is  such  that  the  re¬ 
sulting  function  is  infinite  if  any  element  of  z  or  x  is  zero,  then  the  convergence  of 
Newton’s  method  follows  from  an  identical  argument  to  that  given  for  the  second 
primal-dual  method.  To  illustrate  the  approach,  consider  the  following  linear  combi¬ 
nation  of  the  fpd  and  /,**: 

Pfpd  +  -  fpd  =  0- 

The  resulting  Jacobian  is 

f-nZ-^X-'  +  lX  ~Z  —  fiX~2Z~l  0  \ 

H  '4 

\  0  A  0  / 

If  v  =  fiZ(Z2X2  —  fi2J)~l(fie  —  2 Zx  +  ~Z2X2e),  the  corresponding  KKT  system 
reduces  to 

(h  -f  X~lZ  AT \  (  a4  _  (e+  Hx  —  AT-  +  u\ 

\  A  0  )  \-Axj  Ax-b  )' 

which  has  the  same  KKT  matrix  as  the  two  individual  primal-dual  methods. 


Chapter  7 


QPS  Format 


7.1  Introduction 


For  linear  programming  problems,  the  MPS  format  [IBM76]  has  become  the  industry 
standard  and  it  is  recognized  by  all  commercial  mathematical  programming  systems. 
Unfortunately,  there  is  no  equivalent  standard  format  for  quadratic  programming.  In 
this  chapter  we  define  a  format  suitable  for  large,  sparse  quadratic  programs  of  the 
form 


minimize  cTx  +  \xTHx 

subject  to  bi  <  Ax  <  b2  (7.1.1) 

l  <  X  <u. 


The  basis  of  designing  the  format  has  been  to  adhere  as  closely  as  possible  to  the 
desirable  features  present  in  the  MPS  format. 


7.2  The  QPS  File 


A  QPS  file  is  required  to  define  the  quadratic  program.  The  file  contains  names  for 
the  variables  and  constraints,  as  well  as  the  nonzero  elements  of  A ,  H,  6j ,  b2,  c,  £  and 
u.  As  with  an  MPS  file,  a  fixed  format  must  be  use  1,  i.e.,  each  item  of  data  must 
appear  in  specific  columns.  The  QPS  file  is  divided  into  the  following  sections: 
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NAME 

VARIABLES 

HESSIAN 

CVECTOR 

ROWS 

COLUMNS 

ENDATA 

Each  section  starts  with  a  header  line  that  is  the  section  name  beginning  in  col¬ 
umn  1.  The  remaining  data  within  a  section  has  the  following  fixed  format: 

Columns  2.3  5... 12  15... 22  25... 36  40... 47  50... 61 

Contents  Key  NameO  Namel  Valuel  Name2  Value2 

In  addition,  “comment”  lines  are  allowed;  these  have  an  asterisk  **’  in  column  1 
and  any  characters  in  columns  2-22. 

The  NAME  Section. 

1 . .4  5. .12  15. ..22 

NAME  MYQPNAME  (for  example) 

This  section  constitutes  just  one  line  with  the  word  NAME  in  columns  1-4,  and  a 
name  for  the  problem  in  columns  15-22.  This  NAME  line  is  normally  the  first  one  in 
the  MPS  file,  but  it  may  be  preceded  or  followed  by  comment  lines.  The  name  for 
the  problem  is  used  to  label  the  solu‘ion  output  and  it  may  be  from  1  to  8  characters 
of  any  kind,  or  it  may  be  blank. 
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The  VARIABLES  Section. 


Key 

NameO 

Namel 

Valuel 

Name2 

Value2 

2.3 

5. .  .12 

15. . .22 

25... 36 

40.. .47 

50. . .61 

(fields) 

VARIABLES 

LO 

VAR01 

1.0 

UP 

VAR02 

3.5 

RA 

VAR03 

-4.6 

90.7 

(for  example) 

MI 

VAR04 

200.0 

FR 

VAR05 

This  section  contains  one  line  for  each  variable.  Field  Named  gives  the  variable 
an  8-character  name,  and  field  Key  defines  the  variable  type  by  specifying  its  bounds. 
The  default  bounds  on  all  variables  Xj  (excluding  slacks)  are  0  <  Xj  <  oo.  If  necessary, 
the  default  values  0  and  oo  can  be  changed  to  /  <  Xj  <  u.  The  various  bound-types 
are: 

Key  Bound-type 
FR  Free  variable 
FX  Fixed  variable 

MI  Minus  infinity 

PL  Plus  infinity 
RA  Range 

LO  Lower  bound 

UP  Upper  bound 

All  these  bound  specifications  overwrite  the  default  values.  The  numerical  values 
for  the  bounds  are  specified  in  the  fields  Valuel  and  ValueS.  For  instance,  they  define 
respectively  the  lower  and  upper  bound  for  the  range  type.  Incidentally,  only  this  type 
(RA)  needs  two  values.  The  remaining  bound-types  need  at  most  one.  The  numerical 
value,  say  /?,  can  appear  either  in  field  Valuel  or  Value2 ,  but  not  :n  both.  That  is, 
one  of  them  should  be  blank  in  the  input  file;  otherwise  the  sum  of  the  two  values  will 
be  taken  as  the  bound.  In  all  sections  of  the  QPS  file,  fields  Valuel  and  Value2  are 
read  using  Fortran  format  E12.0.  This  allows  values  to  be  entered  in  several  forms; 
for  example,  1.2345678,  1 . 2345678E+0,  and  12.345678E-1  all  represent  the  same 
number.  It  is  usually  best  to  include  an  explicit  decimal  point.  In  some  computer 
systems,  spaces  within  the  value  field  are  taken  as  0’s.  Hence,  if  an  exponent  like  E-3 
is  used,  it  must  be  right-justified  in  the  value  field. 
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The  types  LO  and  UP  are  modifiers  of  the  default  values.  The  tyoe  LO  will  overwrite 
only  the  default  lower  bound.  Analogously,  the  UP  type  will  overwrite  only  the  default 
upper  bound.  Every  variable  must  appear  in  the  VARIABLES  section  and  at  most 
once.  If  a  variable  is  missing,  it  will  not  be  recognized  when  its  name  appears  in  later 
sections.  If  there  are  duplicates  (more  than  one  entry  with  the  same  variable  name) 
an  error  message  will  be  issued  indicating  so.  For  instance,  the  following  combination 
cannot  be  used  to  define  a  bounded  variable: 

LO  VAR123  -5.0 

UP  VAR123  7.0 

Instead,  the  range  bound-type  should  be  used: 

RA  VAR123  -5.0  7.0 

For  the  bound-types  other  than  RA,  let  /  and  u  be  the  default  lower  and  upper 
bounds  respectively,  and  let  x  and  /?  be  the  variable  and  bound  specified  in  the 
VARIABLES  section.  The  various  types  allowed  result  in  the  following  bounds: 


Key 

Bound-type 

Resulting  bounds 

LO 

Lower  bound 

0 

<  X  < 

u 

UP 

Upper  bound 

e 

<  I  < 

P 

FX 

Fixed  variable 

P 

<  J  < 

P 

FR 

Free  variable 

—  00 

<  X  < 

oo 

MI 

Minus  infinity 

— oo 

<  X  < 

p 

PL 

Plus  infinity 

p 

<  X  < 

oo 

Applying  these  rules  to  the 

examples 

at 

following  bounds: 


1.0 

< 

VAR01 

< 

oo 

— oo 

< 

VAR02 

< 

3.5 

-4.6 

< 

VAR03 

< 

90.7 

— oc 

< 

VAR04 

< 

200.0 

— oo 

< 

VAR05 

< 

oo 

Note  that  types  FR,  MI,  or  PL  should  always  be  used  to  specify  “infinite"  bounds: 
they  imply  values  of  ±102°,  which  are  treated  specially  at  certain  times.  Note  also 
that  types  MI  and  PL  are  different  from  the  MPS  format. 
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The  HESSIAN  Section. 


NameO 

Namel 

Value  1 

5..  .12 

15.. .22 

25... 36 

HESSIAN 

VAR01 

VAR01 

1.0 

VAR01 

VARll 

2.0 

VAR01 

VAR11 

VAR12 

-1.0 

VAR11 

VAR23 

-8.4 

VAR25 

VAR25 

VAR33 

2.2 

VAR22 

YAR25 

3.3 

NameS  ValueS 

40... 47  50... 61  (fields) 


VAR03 

-3.0 

VAR21 

4.5 

VAR30 

5.5 

VAR15 

80. 

VAR25 

3.3 

VAR22 

0.4 

This  section  contains  the  nonzero  elements  of  the  Hessian  of  the  quadratic  pro¬ 
gram.  To  specify  the  indices  for  the  nonzero  element  Ay,  the  names  associated  with 
the  variables  X;  and  Xj  in  the  previous  section  are  used.  Each  line  defines  at  most 
two  elements.  Ay  and  hkj  say.  Fields  NameO ,  Namel  and  Name2  contain  the  names 
for  the  jth  variable,  ith  constraint  and  fcth  constraint  respectively.  Fields  Value  1  and 
Value2  contain  the  numerical  values  Ay  and  hkj  respectively. 

Since  the  Hessian  is  symmetric,  only  the  elements  in  either  the  lower  or  the  upper 
triangular  block  need  to  be  specified.  If  both  elements  Ay  and  A,,  are  entered,  an 
error  message  will  be  issued  indicating  that  there  is  a  duplicate. 

Notice  that  the  element  ay  of  the  constraint  matrix  involves  indices  of  two  distinct 
sets:  the  VARIABLE  names  and  the  ROW  names.  In  contrast,  the  indices  of  the 
element  Ay  of  the  Hessian  arc  both  VARIABLE  names.  Therefore,  the  ordering  of 
the  variables  becomes  critical  when  entering  the  Hessian.  It  is  more  efficient  and 
computationally  convenient  to  have  an  ordering  already  assigned  to  the  variables 
before  the  Hessian  u  read  in.  For  this  reason,  we  chose  the  VARIABLE  section  to 
be  the  one  that  defines  the  ordering  of  the  variables,  namely  the  order  in  which  they 
appear. 

The  nonzero  elements  of  the  Hessian  must  be  entered  by  column;  that  is,  they 
must  appear  grouped  together  before  the  elements  of  the  next  column  in  the  QPS 
file.  If  a  erdunr..  several  nenzeros  it  does  not  matter  what  order  they  appear  in. 
Columns  may  abo  be  specified  in  any  order.  However,  sorting  the  Hessian  dements 
can  be  avoided  if  the  columns  of  the  Hessian  are  entered  in  the  order  defined  in 
the  VARIABLES  section,  and  only  the  elements  in  the  lower  triangular  block  are 
specified.  The  implementation  recognizes  such  a  case  and  economizes  accordingly. 
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The  CVECTOR  Section. 

Kamel  Valuel 

Name2 

Value2 

15... 22  25... 36 

40.. .47 

50... 61  (fields) 

CVECTOR 

VAR01  1.0 

VAR02 

-3.0 

VAR50 

2.2 

VAR33  -1.3 

VAR20 

50.1 

VAR40  2.2 

This  section  contains  the  nonzero  elements  of  the  vector  c  involved  in  the  linear 
term  of  the  quadratic  objective  function. 

For  each  nonzero  component  of  c,  say  Cj,  field  Namel  (or  Name2 )  contains  the 
name  for  the  jth  variable  and  field  Value  1  (or  Value2)  contains  the  numerical  value 
Cj.  At  most  two  nonzero  components  of  c  can  be  specified  per  line. 

The  ROWS  Section. 

Key  NameO  Narrel  Valuel  Name2  Value2 


2.3 

5.. 15  15.. 22  25.-36  40.. 47  50.. 61  (fields) 

ROWS 

E 

R0W01 

-3.0 

G 

R0W02 

2.2 

RA 

R0W03 

-1.3  50.1 

L 

R0W04 

100.2 

In  this 

section  we 

have  kept  the  terminology  used  in  linear  programming,  where 

the  general  constraints  are  referred  as  rows.  The  ROWS  section  contains  one  line  for 

each  constraint. 

Key 

Row-type 

Resulting  constraint 

E 

— 

aTx  =  0 

G 

> 

aTx  >  0 

L 

< 

aTx  <  3 

RA 

range 

VI 

o 

VI 

The  1-character  Key  may  be  in  column  2  or  3.  Row-types  have  the  natural 
interpretation  in  terms  of  a  linear  function  aTx  and  bounds  3,1.  and  u.  Nonzero 
dements  of  the  row- vector  a  will  appear  in  appropriate  parts  of  the  next  section. 
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The  COLUMNS  Section. 


NameO 

Namel 

Valuel 

Name2 

Value2 

5.  .12 

15. .22 

25... 36 

40. .47 

50... 61  (fields) 

COLUMNS 

VAR01 

ROWOl 

1.0 

R0W02 

-3.0 

VAR01 

R0W05 

2.5 

R0W03 

1.123 

VAR02 

R0W02 

-11.123 

VAR02 

R0W20 

7.777 

VAR03 

R0H01 

1.2E-02 

This  section  defines  the  constraint  matrix.  It  uses  the  name  assigned  to  each 
variable  Xj  in  the  VARIABLES  section  and  lists  the  nonzero  entries  in  the  cor¬ 
responding  column  of  the  constraint  matrix,  using  the  same  fields  as  the  HESSIAN 
section.  As  in  the  latter,  the  nonzero  elements  within  a  particular  column  must  be 
grouped  together.  If  a  column  has  several  nonzeros,  it  does  not  matter  what  order 
they  appear  in  (as  long  as  they  all  appear  before  the  next  column).  Columns  in  the 
constraint  matrix  correspond  to  variables  in  the  problem;  hence  the  VARIABLES 
section  also  defines  an  ordering  for  them.  As  in  the  HESSIAN  section,  it  should  be 
possible  to  enter  columns  in  any  order,  and  a  sort  would  again  be  required.  This 
option  has  not  yet  been  implemented. 

In  general,  NameO  is  the  column/variable  name,  and  Namcl ,  Valuel  give  a  row 
name  and  a  value  for  some  element  in  that  column.  If  there  is  another  nonzero 
element  in  that  column,  it  may  appear  ^s  Name2,  ValueS  on  the  same  line,  or  may 
be  on  the  next  line.  If  either  Namel  or  Name2  is  blank,  the  corresponding  value  is 
ignored. 

There  is  no  need  to  specify  columns  for  the  slack  variables;  they  are  incorporated 
implicitly. 

If  the  columns  appear  in  the  same  order  as  in  the  VARIABLES  section,  the  im¬ 
plementation  recognizes  that  the  constraint  data  need  not  be  sorted. 

7.3  Differences  from  MPS  Format 

The  design  of  the  QPS  format  combines  the  desirable  features  present  in  the  MPS 
format  for  linear  programs,  with  the  necessary  extensions  to  be  able  to  specify  a 
quadratic  program.  In  this  section  we  describe  the  differences  between  the  two  for¬ 
mats. 

In  the  MPS  format  the  names  for  the  variables  are  specified  in  the  COLUMNS 
section  and  their  bounds  are  defined  in  the  BOUNDS  section.  In  the  QPS  format 
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there  is  no  BOUNDS  section.  Instead,  a  new  section,  the  VARIABLES  section,  has 
been  introduced  where  both  names  and  bounds  are  specified. 

With  the  MPS  format  it  is  possible  to  define  several  related  problems  in  just  one 
file,  by  specifying  several  right-hand  sides  (for  example)  or  several  bound  sets.  At  the 
moment  only  one  problem  can  be  specified  in  a  QPS  file. 

In  order  to  minimize  the  storage  required,  the  QPS  fo:  mat  does  not  allow  a  VARI¬ 
ABLE  name  to  be  the  same  as  a  ROW  name. 

The  general  constraints  are  referred  to  as  rows  in  both  formats.  In  the  MPS 
format  a  constraint  is  defined  as  follows:  its  name  and  type  are  specified  in  the 
ROWS  section,  its  right-hand  side  (if  nonzero)  is  defined  in  the  RIIS  section  and 
for  the  case  where  the  constraint  has  lower  and  upper  bounds,  they  are  specified 
implicitly  in  the  RANGES  section.  In  the  QPS  format  the  ROWS  section  contains 
not  only  the  name  and  type  of  a  constraint,  but  also  its  bounds.  Hence,  the  RHS  and 
RANGES  sections  are  no  longer  needed. 

Unlike  the  MPS  format,  in  the  QPS  format  the  COLUMNS  section  does  not 
include  the  vector  c  involved  in  the  linear  term  cFx  of  the  objective  function.  Instead, 
it  is  defined  in  the  CVECTOR  section. 

A  new  section  has  been  introduced  in  order  to  specify  the  Hessian  matrix.  The 
format  used  to  express  this  matrix  is  identical  to  the  one  used  in  the  COLUMNS 
section  of  the  MPS  format  to  define  the  constraint  matrix. 

In  the  MPS  format  there  are  only  two  sections  where  the  variables  appear:  the 
COLUMNS  section  and  the  BOUNDS  section.  The  ordering  assigned  to  the  variables 
corresponds  o  that  in  which  they  appear  in  the  COLUMNS  section.  In  the  QPS 
format,  there  are  three  sections  involving  the  variables:  the  VARIABLES  section, 
the  HES  TAN  section,  and  the  COLUMNS  section.  The  VARIABLES  section  must 
come  first  and  it  is  used  to  define  the  ordering  of  the  variables.  This  extension  proved 
necessary  because  the  HESSIAN  and  COLUMNS  sections  may  not  contain  a  complete 
list  of  variables.  (In  a  general  QP,  both  H  and  A  may  contain  empty  columns.) 

If  the  HESSIAN  section  was  required  to  contain  a  complete  list  of  variables,  it 
would  seem  that  we  could  use  it  as  the  one  that  defines  the  ordering  in  the  variables. 
Unfortunately  it  vruld  no  longer  be  possible  to  avoid  sorting  the  Hessian  elements  if 
we  insist  on  checking  for  duplicates. 


7.4  Test  Problems 

Linear  programs  are  a  subset  of  the  problems  under  study.  Hence,  a  code  has  been 
developed  that  takes  as  input  the  specification  of  a  linear  programming  problem  in 
MPS  format  and  produces  as  output  the  corresponding  definition  in  QPS  format.  In 
this  wav,  test  problems  can  be  generated  from  the  set  of  linear  programs  available  in 
the  software  library  cllib  [GayS5j. 
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Finally  we  point  out  that  the  QPS  format  is  particularly  appropriate  to  define 
problems  that  result  from  adding  a  quadratic  term  to  an  existing  linear  program, 
whose  specification  in  the  MPS  format  is  already  available.  For  instance,  regularized 
linear  programs  can  be  defined  in  a  straightforward  manner,  i.e.,  those  with  objective 
function  of  the  form 

cTx  +  \p\\x  -  x0||2, 

where  p  >  0.  (By  varying  the  size  of  p  a  series  of  test  cases  can  be  generated  with 
varying  degrees  of  nonlinearity.)  On  the  other  hand,  a  different  format  might  be 
more  suitable  when  the  sparsity  pattern  of  the  constraints  and  the  Hessian  of  the 
Lagrangian  function  are  related.  Quadratic  programs  of  this  form  arise  in  the  opti¬ 
mization  of  electrical  power  transmission  (the  Optimal  Power  Flow  problem  [SAMSOjj 
when  one  applies  a  sequential  quadratic  programming  method  using  exact  second 
derivatives. 

In  general,  the  QPS  format  will  be  appropriate  as  long  as  the  Hessian  can  be 
expressed  using  one  of  the  common  formats  for  sparse  matrices,  namely,  a  list  of 
triples  ( i,j,htj ). 


7.5.  Example 
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*  This  is  an 

*  . 1. . . 

example  of 

. . 2 _ 

a  QP  in  QPS  format. 

. 3 _ 4 _ 

.5 . . 
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8.0 

RA  PIE 

0.0 

2.0 

RA  PORKBEAX 

0.0 

2.0 

UP  BREAD 
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LO  BUTTER 
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Chapter  8 

Implernentation  Issues  and  Numerical  Results 

8.1  Introduction 

An  implementation  of  the  barrier  algorithm  BARQP  was  developed  for  the  solution 
of  both  linear  and  quadratic  programming  problems  in  the  standard  form 

minimize  Qlx)  =  cTx  +  \xrHx 
subject  to  Ax  =  b,  l  <  x  <u. 

The  implementation  treats  A  and  H  as  sparse  matrices  and  b.  c,  l  and  u  as  sparse 
vectors,  specified  in  the  QPS  format  of  Chapter  7.  The  algorithm  follows  closely  that 
described  in  Chapters  3  and  4. 

Barrier  algorithms  are  sensitive  to  many  of  the  parameters  used  in  their  definition. 
The  development  of  a  practical  implementation  involves  considerable  experimentation 
to  determine  values  for  these  critical  parameters  that  strike  a  balance  between  effi¬ 
ciency  and  reliability.  In  this  chapter  we  describe  the  main  parameters,  such  as  the 
initial  /r,  the  final  /r,  and  the  distance  of  the  initial  point  from  its  bounds. 

Not  all  the  features  are  currently  incorporated  in  the  code.  For  example,  we  have 
yet  to  include  procedures  for  computing  directions  of  negative  curvature.  As  previ¬ 
ously  mentioned  we  hope  to  reduce  the  need  for  such  directions  by  a  suitable  choice 
of  parameters.  However,  any  complete  implementation  would  need  such  procedures 
as  a  safeguard. 

One  of  the  aims  has  been  to  explore  the  performance  of  the  barrier  algorithm  on  a 
set  of  real-world  problems,  and  to  compare  its  execution  time  with  that  of  MINOS,  a 
well  known  code  for  sparse  linear  and  nonlinear  programming  [MS83j.  These  results 
are  not  meant  as  an  accurate  reflection  of  the  performance  of  a  barrier  QP  algorithm 
but  are  intended  to  serve  as  an  indication  of  the  algorithm’s  potential. 

8.2  Free  Variables 

The  form  of  the  problem  discussed  so  far  assumes  that  a  variable  has  one  or  two 
bounds.  In  practice  some  variables  may  have  no  bounds.  Such  variables  arc  termed 
free  and  they  are  not  included  in  the  barrier  term.  The  resulting  zero  diagonals  present 
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a  difficulty  if  the  KKT  system  is  solved  via  the  Schur  complement  (see  Equation 
(6.2.6)).  (In  contrast,  free  variables  are  an  asset  to  the  simplex  algorithm  since  they 
may  be  included  in  the  basis  throughout.)  If  the  KKT  system  is  solved  directly,  the 
only  difficulty  is  that  the  pivot  order  chosen  in  the  symbolic  factorization  may  be 
revised  in  the  numerical  phase.  Interestingly  for  QP’s.  free  variables  may  not  present 
a  difficulty  even  if  the  KKT  system  is  solved  by  the  Schur  complement  approach, 
since  Hs  =  H  +  uD  couM  be  nonsingular  even  if  D  is  singular. 


8.3  Feasibility 

Interior-point  methods  differ  from  the  simplex  method  in  satisfying  the  bounds 
t  <  i  <  u  throughout  and  declaring  “primal  feasibility”  when  Ax  =  b.  (The  primal 
simplex  method  satisfies  Ax  =  6  throughout  ar«d  achieves  feasibility  when  (  <  x  <  u.) 

Early  interior- point  implementations  for  LP  achieved  primal  feasibility  by  setting 
up  a  Phase  1  problem  of  the  form 

min  £ 

subject  to  Ax  +  £r0  =  b , 

(  <  x  <  u,  £>0. 

(e.g.  [KarS4.GMS*86,MarS9j)1  where  r0  =  b—Ax0  is  the  residual  for  some  given  initial 
point  j0  with  (  <  xQ  <  ti.  A  refinement  is  to  use  a  “composite  objective”  of  the  form 
•xe'x  +  £  for  some  u:  >  0,  in  order  to  drive  the  feasibility  phase  closer  to  optimality 
for  the  original  problem. 

This  approach  was  followed  in  the  initial  versions  of  our  barrier  QP  method.  1  he 
Phase  1  problem  took  the  form 

min  u,-{cT x  +  %xTHx)  +  £ 

subject  to  Ax  r  £r0  .-=  h. 


( <x<u ,  £  >  -!. 

and  feasibility  was  declared  when  £  decreased  from  £o  —  i  to  zero.  1.  nfor'unately 
r3  is  Typically  a  dense  vector,  and  the  parameter  a.’  must  l>c  chosen  carefully.  Also, 
provision  must  be  made  to  reduce  w  at  any  iteration  if  £  does  not  decrease. 

These  difficulties  are  avoided  by  treating  the  barrier  method  as  the  application 
of  Newton’s  method  to  the  QP  feasibility  and  optimality  conditions,  as  described  m 
Chapter  4.  The  result  is  a  "single- phase”  primal  method  in  which  the  merit  function 
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is  reduced  at  each  iteration.  In  general,  r  =  b  —  Ax  is  reduced  to  zero  before  gL,  and 
remains  zero  thereafter  (since  the  equations  Ax  =  b  are  linear).  Thus,  feasibility  may 
be  achieved  even  if  the  algorithm  is  terminated  prematurely. 


8.4  Solution  of  the  KKT  Systems 

Our  main  software  tool  for  solving  the  KKT  systems  is  the  Harwell  Subroutine  Library 
package  MA27,  due  to  Duff  and  Reid  [DR82,DR83].  This  is  a  multifroutal  code  for 
solving  sparse  symmetric  linear  equations  Ky  =  z.  When  I(  is  sufficiently  positive 
definite,  MA27  computes  a  sparse  Cholesky  factorization  K  =  UTDU ,  where  D  is 
diagonal  and  U  is  unit  upper  triangular.  Otherwise,  MA27  uses  an  implementation 
of  the  Bunch-Kaufman  algorithm  [BI<77]  to  compute  K  =  UrDU,  where  D  is  now 
block-diagonal  with  blocks  of  dimension  1  or  2. 

Two  of  the  design  aims  for  MA27  were  to  be  competitive  with  existing  Cholesky 
codes  on  definite  systems,  and  to  be  nearly  as  efficient  on  indefinite  systems.  The 
first  airn  was  achieved  and  the  second  was  at  least  partially  achieved,  in  the  sense 
that  the  package  has  proved  to  be  efficient  on  systems  that  are  only  slightly  indefinite 
(i.e.  systems  where  nearly  all  eigenvalues  have  the  same  sign).  Unfortunately,  the 
KKT  systems  arising  in  our  context  are  usually  very  indefinite  (unless  m  <C  n).  The 
analyze  phase  of  MA27  assumes  that  I(  is  definite  and  in  particular  that  all  diagonals 
of  K  are  nonzero.  When  this  is  not  true,  the  tentative  pivot  order  is  sometimes 
drastically  revised  in  the  numerical  phase,  with  a  consequent  increase  in  nonzeros  in 
the  numerical  factors.  The  associated  increase  in  computation  time  is  substantial  in 
the  case  of  KKT  systems. 

In  the  near  future,  we  expect  a  significant  improvement  will  be  realized  using  a 
new  version  of  the  package  to  be  called  MA47;  see  Duff  el  al.  [DGR*89]. 

In  general,  Aasen’s  tridiagonalization  method  [Aas7l]  is  considered  competitive 
with  the  Bunch-Kaufman  approach  for  solving  dense  indefinite  systems.  Aasen’s 
method  computes  a  factorization  of  the  form  K  =  U7)TU  where  T  is  tridiagonal. 
However,  we  do  not  know  of  a  sparse  implementation. 


8.5  Rank  Deficiency  in  the  Constraint  Matrix 


In  many  practical  applications,  the  constraint  matrix  is  either  poorly  conditioned 
or  singular.  The  associated  KKT  system  is  then  also  ill-conditioned  or  singular. 
(Consider  the  case  when  the  rows  of  the  constraint  matrix  A  are  linearly  dependent.) 
When  A  is  ill-conditioned,  we  consider  the  following  modified  KKT  system: 


Hb  /ir 
A  -61 


(8.5.1) 
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where  6  >  0,  gL  =  cT+  Hxk  —  ATnk  and  r  =  b  —  Axk..  In  most  cases  the  modified 
system  will  be  better  conditioned  when  A  is  poorly  conditioned.  If  8  is  small  we 
would  not  expect  the  convergence  of  Newton’s  method  to  be  significantly  affected. 
In  practice  we  have  used  8  values  in  the  range  (10“8, 10-'1).  Systems  of  the  form 
(8.5.1)  have  been  studied  in  the  context  of  sequential  quadratic  programming  by 
[Mur69,Big75,Gou86b]. 

The  modified  system  corresponds  to  the  following  perturbed  QP: 


min  cTp  +  \pTHBp  +  \8yTy 

p,y  1  1 

subject  to  Ap  —  8y  =  r. 

We  obsei”-  that  the  dual  to  (8.5.2)  is 

min  rTy  +  \pTHBp  +  \8yTy 

subject  to  ATy  +  HBp  =  c. 


(8.5.2) 


(8.5.3) 


In  this  formulation  we  see  that  the  modification  term  serves  to  bound  the  Lagrange 
multipliers,  which  are  unbounded  when  A  is  rank  deficient.  (The  modification  can 
therefore  be  viewed  as  a  trust-region  treatment  of  the  Lagrange  multipliers.) 

We  conclude  this  discussion  by  simply  stating  that  the  search  direction  given  by 
the  modified  system  is  usually  still  a  descent  direction  for  the  merit  function  given  in 
Chapter  3.  More  specifically, 

(pT  qT)VM{x, tt)  =  -||st|j  -  ||r||  -  |j^jj q\ 
which  is  negative  if  qTr  >  0  or  8  is  sufficiently  small.. 


8.6  Simple  Extrapolation 

Let  x  and  gL(x,p)  be  the  final  point  and  gradient  obtained  for  a  particular  barrier 
parameter  p.  When  p  is  reduced,  we  generally  retain  x  as  the  starting  point  for 
the  new  subproblem.  Unfortunately,  ||^A ||  inevitably  suffers  a  sharp  increase.  For 
example,  consider  a  variable  x}  that  is  close  to  its  lower  bound  of  zero,  and  suppose 
p  changes  to  ftp  (0  <  /?  <  1).  By  definition,  the  j- th  component  of  gL  changes  by  the 
amount 

p/xj  -  (3p/xj  =  (1  -  P)p/xj  w  (1  -  P)zh 

where  Zj  is  the  j-th  element  of  c  +  Hx  —  Atk.  If  Zj  0  and  |j(7£||  was  previously 
small,  we  see  that  ||(7Z,||  will  no  longer  be  small. 


98 


Chapter  8.  Implementation  Issues  and  Numerical  Results 


A  simple  means  for  counteracting  this  effect  is  to  change  x}  to  0Xj,  i.e.,  to  reduce 
“small”  elements  of  x  by  the  same  factor  as  p,  so  that  the  corresponding  gradient 
element  does  not  change.  Our  experience  has  been  that  this  strategy  can  reduce 
the  number  of  minor  iterations  needed  to  solve  each  of  the  subproblems,  as  long  as  a 
conservative  test  is  used  to  judge  whether  Xj  is  “close”  to  a  bound.  (The  main  danger 
lies  in  moving  the  “wrong”  elements  cf  x  towards  their  bounds.) 

8.7  Conventional  Extrapolation 

Let  x(pk)  be  the  solution  to  the  nonlinear  subproblem  B(x,pk).  As  described  in  Fi- 
acco  and  McCormick  [FM68],  if  /x*  were  steadily  reduced  to  zero,  the  infinite  sequence 
of  points  {x(pk)}  would  define  a  trajectory  leading  to  x*,  the  solution  of  the  origi¬ 
nal  problem.  In  practice  we  estimate  ,t (/**)  for  only  a  few  points  on  the  trajectory. 
Using  conventional  extrapolation  we  can  define  an  approximation  to  x*  =  x(0)  by 
extrapolating  the  solutions  of  two  subproblems.  Consider  the  expansion 

*(/**)  +7  Hk  +  0(pl),  (8.7.1) 

where  7  =  x'(0),  and  let  x(px)  and  x‘  V  the  solution  of  subproblems  B(x,  px)  and 
B(x,p2)  respectively,  with  px  >  p2.  8.7.1)  we  obtain 

pxx{p2)  -  pix(px)  =  [px  -  p2)x*  -r  0{p\), 

so  that 

*  1 

x  &  - r{pxx{p2)  -  p2x{pi)). 

{91  ~  M2) 

Since  x*  =  x(p2)  +  p  for  some  p,  we  can  define  an  approximate  step  to  the  solution 
as  follows: 

p  =  X*  -  x(p2) 

~  fhliM  ~  /z^(m  1)  -  (Ml  -  M2MM2) 

(M2-M1) 

=  7 — — — :(a-’(Mi)  ~  a:(M2))- 

(Mi  ~  M2) 

When  p  is  decreased  in  a  linear  fashion,  i.e.  p2  —  0p\  with  0  <  0  <  1,  we  have 

(i  ‘-  m^M2)-^))'  (8-7-2) 

which  can  be  used  as  a  search  direction  to  obtain  an  estimate  x*  «  x{p2)  +  ap  for 
some  positive  steplength  a.  Note  that  nothing  is  lost  if  the  search  proves  unsuccessful, 
since  the  effort  expended  in  the  search  is  negligible. 
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In  the  early  iterations  when  we  are  far  from  the  solution,  it  may  seem  better  to 
extrapolate  to  2(^3),  the  solution  of  the  next  subproblem  B(x,fiz),  instead  of  to  2*. 
Analogously  to  (8.7.1)  we  have 

x(nk)  =  x{p 3)  +  7(/i*  -  M 3)  +  0(n it  -  /i3)2, 

where  7  =  x'(nz).  Hence, 

(92  ~  H3)x(ni)  -  (pi  -  H3)x(h2)  =  {Hi  -  H2)x3  +  0(i4), 


so  that 


(8.7.3) 


where  /?*4  =  /x*  —  Ps-  As  before,  when  /x  decreases  linearly,  (8.7.3)  becomes 

x{n3)  «  -~x(^)  +  ^-^x(/z2). 

Since  2(^3)  =  x(/x2)  +  p  for  some  p,  we  can  define  an  approximate  step  to  2(^3)  as 


P  =  x(Ps)  ~  xM 

=  ~  XM)-  (s-7-4) 

When  fi  is  decreased  in  a  linear  fashion,  (8.7.4)  becomes 

p  «  P(x{p2)  -  2(/xj )).  (8.7.5) 

Comparing  (8.7.2)  and  (8.7.5),  we  see  that  the  extrapolation  to  2*  or  to  2(^x3)  leads 
to  the  same  search  direction  (to  within  a  scalar  factor).  Hence  the  only  difference  can 
be  that  the  linesearch  may  give  a  non-equivalent  steplength. 


8.8  Parameter  Selection 

Within  the  barrier  framework,  a  number  of  parameters  must  be  specified.  In  general, 
we  have  found  that  the  method  is  somewhat  sensitive  to  this  parameter  selection, 
though  there  is  a  fairly  broad  range  of  suitable  choices.  In  this  section  we  summarize 
the  criteria  used  in  our  barrier  method  and  highlight  some  of  the  issues. 

•  Initial  barrier  parameter  p0 

This  parameter  must  be  normalized  by  the  number  of  logarithmic  terms  and 
the  initial  function  value.  That  is,  for  a  specified  value  p0  we  take 

/*»  =  (*V(*o). 

n 
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This  corresponds  to  setting  the  weight  of  the  logarithmic  perturbation  relative 
to  the  original  QP  function  value  F(x0).  In  general,  if  po  is  too  large,  a  number 
of  iterations  will  be  wasted.  That  is,  we  solve  a  series  of  subproblem  that  are 
distant  from  the  one  we  are  interested  in.1  If,  on  the  other  hand,  p0  is  chosen 
too  small,  the  iterates  tend  to  progress  near  the  boundary  of  the  feasible  region 
similarly  to  an  active-set  method.  In  practice,  a  range  of  values  for  p0  between 
1  and  103  work  reasonably  well  on  most  problems. 

•  Final  barrier  parameter  pj 

As  discussed  in  Chapter  3,  the  relative  accuracy  of  the  final  answer  is  propor¬ 
tional  to  the  size  of  the  smallest  p  that  is  used  in  minimizing  B(x,p).  Thus, 
for  a  specified  value  pj, 

H  =  Q)F{x „) 

yields  |  log(j5/)|  digits  of  accuracy.  We  have  used  pj  =  10-6. 

•  Initial  point  x0 

The  provision  of  a  “good”  initial  point  is  critical  to  the  performance  of  the 
algorithm.  In  general,  both  feasibility  as  well  as  optimality  should  be  considered. 
Ideally,  the  initial  point  should  be  chosen  near  the  solution  of  the  minimization 
problem  for  the  corresponding  choice  of  p.  If,  for  example,  approximate  values 
of  the  variables  are  known,  they  can  be  used  for  an  initial  point.  Unfortunately, 
this  information  is  normally  not  known  and  we  must  be  satisfied  with  a  “rough” 
initial  point  that  is  judiciously  far  from  the  bounds. 

For  uhe  barrier  method,  it  is  necessary  that  the  initial  point  be  feasible  with 
respect  to  the  inequality  constraints.  Since  the  inequalities  are  only  simple 
bounds,  this  is  easily  accomplished.  However,  feasibility  of  the  equality  con¬ 
straints  as  well  as  the  overall  optimality  conditions  should  be  considered  when 
determining  the  starting  point. 

A  number  of  possible  strategies  have  been  proposed  in  conjunction  with  the 
barrier  method  [Mar89].  One  possibility  is  to  choose  i0  such  that 

INI  «  \\b\l 

where  b  is  constructed  from  positive  lower  bounds  and  negative  upper  bounds 
as  follows: 

h  =  b  -  ^2  ?jai  ~  ujaj- 

tj>  0  U;<0 

1  It  should  be  noted  that  even  when  the  barrier  function  is  far  from  the  original  objective  function, 
significant  progress  can  be  made  toward  feasibility. 
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If  the  matrix  A  is  well-scaled,  this  choice  generates  a  starting  point  that  is 
probably  of  the  same  magnitude  as  the  true  solution.  Other  possibilities  include 
using  some  kind  of  greedy  algorithm  to  choose  each  component  of  xq  such  that 
the  initial  residual  ro  =  b  —  Axq  is  to  some  extent  minimized. 

In  our  implementation,  the  initial  point  has  been  chosen  to  encourage  rapid 
convergence  to  the  solution  of  the  first  barrier  subproblem.  To  illustrate  some 
of  the  issues  involved,  consider  the  barrier  function 

B(x)  =  F(x)  -fif^ln(xj)  (8.8.1) 

j= i 

corresponding  to  the  bounds  x  >  0.  Let  us  first  assume  that  x0  is  chosen  to  lie 
“close”  to  the  bounds  (i.e.,  ||x0||  <C  1).  In  a  neighborhood  of  the  bounds,  the 
objective  function  is  highly  nonlinear;  that  is,  the  nonlinear  logarithmic  term 
dominates  the  overall  function  value.  This  nonlinear  behavior  can  be  seen  from 
(4.7.6), 

Sl  =  (I  -  a)0L  +  1(a), 

which  expresses  the  new  gradient  as  a  function  of  the  old  gradient  and  the 
current  point.  The  last  term  t(a)  accounts  for  the  nonlinearity  of  the  function. 
We  can  see  from  (4.9.10)  that  when  some  of  the  xf  s  are  near  their  bounds, 
this  nonlinear  term  will  dominate  the  equation  (unless  the  corresponding  p,’s 
are  small).  Newton’s  method  proceeds  by  forming  a  linear  approximation  to 
t(a).  However,  the  linear  approximation  does  not  accurately  capture  the  highly 
nonlinear  behavior  of  the  function  when  some  components  of  x  arc  near  their 
bounds.  Therefore,  Newton’s  method  cannot  be  expected  to  converge  quickly 
(as  is  observed  experimentally  for  many  problems). 

Now  consider  the  converse  situation;  that  is,  assume  that  the  starting  point 
lies  far  from  the  bounds.  In  this  case,  xo  is  probably  far  from  the  true  solu¬ 
tion  (assuming  that  some  components  of  x*  lie  on  their  bounds).  However,  the 
gradients  of  the  logarithmic  terms  in  (8.8.1)  are  now  almost  negligible.  Un¬ 
fortunately,  it  is  these  terms  that  force  the  iterates  to  pass  through  the  center 
of  the  feasible  region  (as  opposed  to  moving  along  the  boundary).  Now  that 
they  are  small,  the  iterates  tend  to  move  toward  the  perimeter  of  the  feasible 
region  and  thus  require  more  iterations.  Roughly  speaking,  a  typical  sequence 
of  iterates  might  be  as  follows: 

a)  The  initial  point  is  far  from  all  bounds. 

b)  The  search  direction  is  chosen  as  if  the  problem  were  unconstrained  (since 
the  barrier  terms  are  negligible). 
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c)  The  steplength  is  chosen  so  that  the  next  iterate  is  near  one  bound  but  far 
from  all  others. 

d)  The  computation  of  the  third  point  proceeds  in  a  similar  fashion  to  tire 
computation  of  the  second  point.  Since  only  one  barrier  term  is  significant, 
the  search  direction  is  computed  almost  as  if  only  one  bound  were  present. 

Similarly  for  the  subsequent  iterates.  To  some  extent  the  algorithm  acts  like  an 
active-set  method  where  the  initial  working  set  is  empty. 

To  overcome  these  difficulties,  we  follow  [Mar89]  and  introduce  a  linear  term 
into  the  barrier  function.  Specifically,  consider  the  QP 

min  cTx  -f  \xTHx 

X  z 

subject  to  Ax  —  b,  l  <  x, 
and  the  logarithmic/linear  transformation 

B(x,n)  =  min  cTx  +  \xTHx  -  p(£ln(x.,  -  lj)  -  u  f^{xj  -  /_,)), 

i=i  j=i 

where  v  >  0  (e.g.,  v  —  10-4).  Notice  that  B(x,p)  again  approaches  the  true 
solution  as  p  approaches  zero.  The  advantage  of  this  modified  barrier  transfor¬ 
mation  is  that  good  initial  points  can  be  more  easily  found.  In  particular,  the 
new  fraction  B(x,  p)  has  the  property  that 

lim  {x*(p)  -  lj)  =  -. 

This  is  in  contrast  to  the  standard  barrier  function,  which  is  unbounded.  A 
second  advantage  is  that  when  we  are  far  from  the  bounds,  the  gradients  of 
the  linear  terms  (though  small)  are  not  negligible  like  those  of  the  logarithmic 
terms.  This  has  the  advantage  of  making  the  code  somewhat  less  sensitive  to 
the  point  being  far  from  the  bounds.  (See  [M'5ri>9]  for  some  computational 
experiments  on  the  sensitivity  of  the  initial  point  fo:  •  cue.)  Therefore, 

we  choose  the  initial  point  (the  Xj's)  by  the  following  wtiicri&t 

0  if  Xj  is  a  free  variable, 

u:  —  Ijv  if  X:  lias  only  an  upper  bound, 

lj  ^ 

lj  +  ) /v  if  xj  has  only  an  lower  bound, 

.  +  *j)  if  xj  contains  upper  and  lower  bounds. 

Experimentally  we  have  found  that  this  choice  of  aro  behaves  well  over  a  wide 
range  of  problems. 
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•  Merit  function 

It  can  be  shown  that  the  search  direction  generated  by  the  KKT  system  is  a 
descent  direction  for  the  class  of  merit  f«*  '  given  by 

M(x, 

where  u?  is  a  weighting  parameter  (see  C  j-oter  3  for  the  case  u>  =  1).  In 
practice,  we  find  that  it  is  generally  better  .  iavor  feasibility  in  deciding  when 
to  backtrack.  Typically  we  choose  w  =  xO 

•  Adjustment  of  n 

The  strategy  adopted  for  adjusting  p  is  similar  to  that  described  in  [GMS*86]. 
Specifically,  / 1  is  decreased  by  a  factor  if  either  ||(</i.,r)|J  is  reduced  by  a  factor 
7  or  a  maximum  of  ten  iterations  is  reached.  In  the  results  reported,  both  ft 
and  7  were  0.3.  The  efficiency  of  the  algorithm  was  not  unduly  sensitive  to  this 
choice.  However,  a  more  elaborate  strategy  may  prove  to  be  more  efficient;  for 
example,  n  could  be  reduced  by  a  factor  that  depends  on  the  current  point. 

8.9  MINOS 

MINOS  is  a  Fortran-based  mathematical  programming  system  designed  to  solve  a 
wide  variety  of  large-scale  linear  and  nonlinear  programs  [MS83].  For  our  comparisons 
we  make  use  of  the  facilities  for  solving  optimization  problems  with  a  smooth  nonlinear 
objective  and  sparse  linear  constraints  [MS78]. 

As  in  the  implementation  of  our  barrier  algorithm,  linear  constraints  are  defined 
in  the  form  Ax  =  6,  /  <  x  <  u.  A  nonlinear  objective  F(x)  is  represented  by  a  Fortran 
subroutine  that  computes  both  F{x)  and  its  gradient  g(x)  for  any  given  feasible  x. 

/or  linearly  constrained  optimization,  MINOS  uses  a  reduced-gradient  algorithm. 
This  is  an  active-set  method  that  computes  a  search  direction  using  a  null-space 
approach.  (See  Chapter  2  for  a  discussion  of  active-set  and  null-space  methods.) 
The  constraint  matrix  is  partitioned  as  A  —  (  B  S  Ar )  where  D  is  nonsingular.  The 
associated  variables  are  termed  basic,  superb asic  and  nonbasic  respectively.  The 
aclive-set  matrix  ond  the  mui-space  matrix  then  take  the  form 


-r  (  B  S  N!\ 

M  >)’ 

z  = 

I 

<  0  J 

v-hicj;  satisfy  AZ  —  0  as  required. 


104 


Chapter  8.  Implementation  Issues  and  Numerical  Results 


MINOS  maintains  a  quasi-Newton  approximation  to  the  reduced  Hessian  in  the 
form  ZTHZ  «  RTR,  where  R  is  upper  triangular.  The  key  equations  for  computing 
a  search  direction  are  therefore 


RTRps  =  -ZTg,  p  =  Zps. 

A  sparse  LU  factorization  of  B  allows  ZTg  and  Zps  to  be  computed  efficiently,  but  R 
is  stored  as  a  dense  triangular  matrix  of  order  s  (the  number  of  superbasic  variables). 
Quasi-Newton  updates  to  R  involve  0(s2)  arithmetic  operations,  as  do  various  other 
updates  that  are  made  to  R  as  the  active  set  changes. 

In  many  cases  s  remains  “moderate”  (say  less  than  100)  and  the  linear  algebra 
involved  in  using  and  updating  R  is  not  excessive.  For  QP,  this  may  be  true  if 
only  a  few  of  the  variables  are  involved  in  the  Hessian,  or  if  the  linear  terms  in  the 
objective  dominate  the  quadratic  part.  However,  if  the  number  of  active  constraints  is 
substantially  less  than  the  number  of  variables,  i.e.,  if  s  is  large,  the  work  involved  in 
maintaining  R  is  correspondingly  substantial.  In  such  cases  we  can  expect  alternative 
methods  (such  as  a  range-space  approach  or  the  barrier  algorithm)  to  be  superior. 

8.10  The  Test  Environment 

Both  BARQP  and  MINOS  are  implemented  in  Fortran  77  and  were  run  on  cne 
processor  of  an  Alliant  FX/8  with  the  U  '<  operating  system.  The  test  runs  were 
made  as  background  jobs  when  the  machir  3  was  lightly  loaded.  Although  the  Alliant 
has  virtual  memory,  sufficient  real  memory  was  available  to  prevent  significant  paging 
activity.  The  recorded  GPU  times  should  be  accurate  to  within  about  1  per  cent. 
Times  are  measured  in  CPU  seconds. 

For  consistency  we  have  scaled  the  constraint  matrix  A  using  the  same  scaling 
routine  as  in  MINOS.  It  may  seem  that  all  of  the  KKT  system  should  be  scaled. 
However,  such  a  scaling  would  have  to  be  applied  at  every  iteration  (since  II  +  pD 
changes).  Notice  that  scaling  A  does  imply  a  symmetric  scaling  of  H. 

The  problems  were  solved  without  any  preprocessing  of  the  data  other  than  scal¬ 
ing.  Most  results  reported  in  the  literature  for  barrier  methods  are  on  problems  that 
have  been  preprocessed,  a  procedure  that  apart  from  reducing  the  size  of  the  problem 
can  significantly  improve  the  condition  of  the  constraint  matrix.  Although  prepro¬ 
cessing  may  be  a  sensible  first  step  in  practice,  our  purpose  has  been  to  try  and  test 
the  limits  of  our  method. 

8.11  Quadratic  Test  Problems 

A  large  collection  of  sparse  linear  programs  resides  in  the  Ip/data  chapter  of  netlib 
(Gay85j.  The  QP  test  problems  were  obtained  from  linear  programs  in  this  collection 
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Rows 

Columns 

Nonzeros 

SHARE2B 

97 

79 

730 

SHARE1B 

118 

225 

1182 

SCFXM1 

331 

457 

2612 

E226 

224 

282 

2767 

SCAGR25 

472 

500 

2029 

SHELL 

537 

1775 

4900 

SCTAP1 

301 

480 

2052 

SCSD1 

78 

760 

314S 

SCSD6 

148 

1350 

5666 

Table  8.1:  LP  test  problems  from  netlib 

by  introducing  a  sparse  matrix  H  into  the  objective.  Initially  we  planned  to  set 
FI  =  pi  for  some  fixed  positive  scalar  /?,  but  the  choice  of  p  was  soon  found  to  be 
problematical  because  of  the  widely  varying  scale  of  the  vector  c.  For  any  given  p, 
the  quadratic  objective 

minimize  cTx  +  \pxTx 

is  sometimes  essentially  the  same  as  crx,  while  in  other  cases  it  is  dominated  by  the 
term  pxTx. 

One  solution  would  have  been  to  set  p  to  be  a  constant  multiple,  of  ||c||.  For 
simplicity  we  chose  to  ignore  c  and  to  set  p  =  1,  thus  obtaining  the  “minimum- 
length”  objective 

minimize  xTx. 

*6Sn 

This  is  one  of  the  simplest  possible  quadratic  objective  functions,  but  it  provides  a 
well-posed  problem  that  often  has  physical  meaning.  Recall  that  the  objective  for 
any  convex  QP  may  be  converted  to  this  form  (with  some  help  of  some  additional 
constraints  and  variables),  though  only  some  of  the  variables  then  appear  in  the 
transformed  objective. 

Table  8.1  gives  details  of  the  constraint  matrix  A  for  a  representative  set  of  the  QP 
test  problems  that  were  solved.  Table  8.2  summarizes  the  performance  of  MINOS  and 
BARQP  on  these  problems.  The  iteration  counts  for  both  methods  are  included  for 
interest,  but  of  course  the  work  per  iteration  differs  substantially.  Oniy  the  execution 
times  should  be  compared. 

The  column  headed  s  shows  the  “degrees  of  freedom”  in  the  solution  obtained 
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QP  Problems 

MINOS 

BARQP 

Name 

s 

Itns 

Time 

Itns 

Time 

SHARE2B 

6 

116 

3.4 

31 

23.3 

SHARE1B 

7 

228 

8.9 

43 

32.9 

SCFXM1 

34 

509 

35.2 

37 

144.9 

E226 

72 

832 

53.0 

41 

182.1 

SCAGR25 

95 

610 

62.6 

30 

49.3 

SHELL 

188 

762 

119.5 

37 

72.9 

SCTAP1 

189 

821 

89.0 

34 

39.2 

SCSD1 

303 

955 

207.6 

25 

77.8 

SCSD6 

528 

1707 

1930.0 

32 

94.5 

Table  8.2:  Comparison  of  MINOS  and  BARQP  on  QP  test  problems 


by  MINOS,  as  measured  by  the  final  number  of  superbasic  variables.  This  is  the 
dimension  of  the  subspace  in  which  MINOS  applies  a  quasi* Newton  method  for  un¬ 
constrained  minimization,  using  a  dense  triangular  matrix  of  dimension  s. 

It  can  be  seen  that  as  s  grows,  the  efficiency  of  MINOS  degrades  substantially. 
In  contrast,  the  efficiency  of  BARQP  appears  to  depend  only  on  the  dimensions  of 
A.  We  can  conclude  that  barrier  algorithms  typified  by  BARQP  are  likely  to  be 
more  efficient  than  conventional  active-set  reduced-gradient  algorithms  (typified  by 
MINOS)  on  the  class  of  QP  problems  having  many  degrees  of  freedom. 

Regarding  linear  programs,  several  examples  in  the  collection  were  chosen  to  pro¬ 
vide  realistic  LP  test  problems.  The  implementation  is  capable  of  processing  the 
larger  netlib  examples,  but  the  execution  time  would  be  correspondingly  greater.  The 
iteration  counts  for  BARQP  are  comparable  to  those  obtained  by  primal  barrier  al¬ 
gorithms  elsewhere  (e.g.  [GMS*86,Mar89]). 

In  most  of  the  LP  problems  tried  in  our  experimentaticn,  the  computation  times 
for  BARQP  were  higher  than  for  the  simplex  implementation  in  MINOS.  The  expla¬ 
nation  for  this  behavior  lies  in  our  use  of  the  KKT  system  to  obtain  search  directions, 
and  the  use  of  MA2T  to  solve  each  system.  We  anticipate  the  BARQP  times  might  be 
reduced  significantly  by  the  use  of  MA47.  For  LP  problems,  H0  is  diagonal  and  vir¬ 
tually  all  existing  implementations  reduce  the  KKT  system  to  the  Schur-complement 
form  AD2ATAn  =  d,  where  D2  =  //“'  is  diagonal.  In  many  practical  cases,  AD2AT 
and  its  Cholesky  factors  are  very  sparse  and  can  be  computed  efficiently  with  state-of- 
the-art  software.  Our  use  of  the  full  KKT  system  rather  than  AD2AT  can  be  expected 
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to  provide  greater  reliability. 

8.12  Sparsity  in  H 

The  sparsity  of  H  affects  the  two  algorithms  in  different  ways.  In  MINOS  the  only 
requirement  is  to  evaluate  Hx  and  xTHx  at  each  trial  point.  If  H  were  in  fact  dense, 
the  work  would  increase  with  problem  size  as  a  function  of  n2,  whereas  in  BARQP  it 
would  be  of  order  n3. 

Thus  without  experimentation  we  can  say  that  a  large  dense  H  may  be  more 
efficiently  handled  by  MINOS,  unless  as  before  it  leads  to  a  large  number  of  degrees 
of  freedom.  In  general,  increased  sparsity  in  H  helps  both  methods. 

8.13  Conclusions 

The  purpose  of  the  experiments  has  not  been  to  conclude  that  BARQP  in  its  present 
state  is  a  highly  efficient  algorithm.  Our  goals  have  been  more  modest.  Firstly,  it  is 
comforting  that  the  number  of  iterations  taken  on  the  QP  problems  is  similar  to  those 
taken  on  the  LP  problems.  If  a  development  of  BARQP  is  to  be  competitive  with 
active-set  methods  for  QP  then  this  is  an  essential  requirement.  Secondly,  the  CPU 
times  for  BARQP  are  in  the  same  ballpark  as  those  of  the  active-set  method  MIN  OS. 
Undoubtedly  a  special-purpose  QP  active-set  method  may  perform  rather  better  than 
MINOS,  especially  on  a  problem  with  many  degrees  of  freedom.  It  is  also  highly  likely 
that  ;he  efficiency  of  BARQP  can  be  improved  dramatically.  A  comparison  of  such 
codes  :  ru.d  await  both  the  development  of  BARQP  and  the  provision  of  a  large-scale 
active-:  QP  code.  While  such  codes  do  exist  the  current  approaches  are  not  based 

on  the  >  nmetric  factorization  of  the  KI\T  matrix,  which  is  likely  to  be  the  most 
efficient  •  .  »proach.  An  interesting  characteristic  of  the  barrier  approach  for  QP  is 
that  the  ost  effective  methods  for  both  the  barrier  algorithm  and  the  active-set 
approach  i  se  the  same  matrix  factorization.  This  should  greatly  facilitate  numerical 
r  c  m  pa-iron. 

At  the  moment  the  only  approach  advocated  for  the  indefinite  case  is  based  on 
a  mimai  algorithm.  In  the  LP  case  it  is  known  that  primal-dual  methods  are  the 
most  effective  approach.  As  described  in  Chapter  6,  such  an  approach  is  possible  in 
the  case  of  convex  QP;  also  see  [CLMS90].  We  could  attempt  such  an  approach  in 
the  indefinite  case  if  we  could  be  assured  that  at  every  iteration  the  KKT  matrix  has 
the  required  inertia.  Failing  that  we  would  need  to  develop  methods  for  computing 
appropriate  directions  of  negative  curvature. 

At  the  commencement  of  this  work  it  was  uncertain  that  the  barrier  approach  to 
LP  could  be  extended  to  more  general  problems.  It  was  known  that  the  complexity 
results  did  extend  to  the  convex  QP  case,  but  this  was  of  little  comfort.  The  concern 
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was  whether  the  numerical  performance  of  such  algorithms  was  inherently  flawed, 
since  it  was  known  that  the  barrier  subproblems  are  in  general  ill-conditioned.  (LP 
was  the  exception  to  this  rule.)  We  can  now  assert  that  provided  the  barrier  trans¬ 
formation  is  applied  to  a  suitably  formulated  problem,  there  is  no  a  priori  reason 
for  assuming  that  the  performance  of  barrier  algorithms  on  general  problems  is  any 
different  from  that  for  the  LP  case.  We  anticipate  that  both  barrier  algorithms  and 
active-set  methods  will  play  a  significant  role  in  solving  large-scale  nonlinear  opti¬ 
mization  problems. 
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SOL  91-2:  Barrier  Methods  for  Large-Scale  Quadratic  Programming,  Dulce  B.  Ponceledn 
(June  1991,  117  pp.). 

We  present  several  new  algorithms  for  solving  the  general  large-scale  quadratic  programming  (QP) 
problem. 

A  feature  of  QP  problems  is  the  presence  of  linear  inequality  constraints,  which  introduce  a  com¬ 
binatorial  aspect  to  the  problem.  Currently  the  most  common  approach  to  solving  QP  problems  is  to 
apply  active-set  methods,  in  which  only  some  of  the  inequalities  are  used  to  produce  a  search  direction 
at  each  stage  The  combinatorial  element  is  therefore  inherent.  As  problems  become  larger,  there  is  a 
potential  for  an  excessive  number  of  iterations  and  consequent  inefficiency. 

In  contrast,  we  use  the  now  familiar  barrier-function  approach,  which  circumvents  the  combinatorial 
aspect  by  introducing  a  barrier  transformation  involving  all  of  the  inequalities.  The  barrier  term  enforces 
satisfaction  of  the  inequalities  by  modifying  the  objective  function.  The  transformed  problem  is  solved 
by  a  modified  Newton  Method  applied  to  the  nonlinear  equations  defining  feasibility  and  optimality. 

The  main  computation  at  each  iteration  of  the  new  algorithms  is  the  solution  of  an  indefinite 
system  of  linear  equations.  Barrier  methods  are  known  to  lead  to  ill-conditioned  systems.  However,  we 
show  by  a  special  sensitivity  analysis  that  the  particular  manner  in  which  we  have  formualted  the  barrier 
transformation  leads  to  ill-conditioning  that  is  benign. 

We  address  the  many  details  that  need  to  be  resolved  in  order  to  define  an  efficient  algorithm 
for  solving  large-scale  QP  problems.  A  specific  barrier  algorithm  has  been  implemented,  with  linear 
programming  (LP)  included  as  a  special  case.  Numerical  results  are  presented  for  a  set  of  sparse  QP 
test  problems.  A  feature  of  the  implementation  is  that  its  efficiency  does  not  depend  on  whether  the 
solutuion  is  near  or  far  from  a  vertex  of  the  feasible  region. 
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